Hi Weijie,

Thanks for bringing this topic up!

Basically you are right, Hive Metastore is one the best candidates for
storing Driil's metadata.
Also it will be good to make an abstraction, which will allow to implement
and use other kind of tools for Metastore.
The question of Metastore performance can be important especially for light
Drill tables.

Currently Vova and I are working on the proposal for metastore.
I have created Jira DRILL-6552 [1] where all the related discussions can be
held.

[1] https://issues.apache.org/jira/browse/DRILL-6552

Kind regards
Vitalii


On Thu, Jun 28, 2018 at 6:49 PM Arina Yelchiyeva <arina.yelchiy...@gmail.com>
wrote:

> Hi,
>
> Vitalii and Vova is also looking at this part, you might want to sync up
> with them. Or even better, we can create Jira for this and held all
> discussions there.
> Vitalii, what do you think?
>
> Kind regards,
> Arina
>
> On Thu, Jun 28, 2018 at 6:46 PM weijie tong <tongweijie...@gmail.com>
> wrote:
>
> > HI all:
> >
> >     As @aman ever noticed me about the roadmap of DRILL-2.0 ,which
> includes
> > the description of  the metadata design (
> >
> >
> https://lists.apache.org/thread.html/74cf48dd78d323535dc942c969e72008884e51f8715f4a20f6f8fb66@%3Cdev.drill.apache.org%3E
> > )
> > , I am interested in taking the role to implement the metadata part.
> > Here I fire this discussion thread to know your idea about this problem.
> >
> >     I have investigated some open source project about the metadata ,such
> > as Hive Metastore (
> > https://cwiki.apache.org/confluence/display/Hive/Design#Design-Metastore
> )
> > ,Netflix metacat, Apache Atlas,LinkedIn WhereHows(
> > https://github.com/linkedin/WhereHows)  ;  Except Hive Metastore, other
> > projects have an high abstract definition to the actual physical metadata
> > which will benefit to extend to add new metadata property. Hive
> Metastore‘s
> > design is to the physical metadata , also with thrift interface to
> > different languages, but depend on the relational database  not good to
> > scale and performance.   To my opinion , I would prefer Hive Metastore as
> > our design template or just reuse it, as we don't need to do a rich
> > metadata management system. Maybe we should change the backend database
> to
> > a high query performance kv store like Hbase.
> >
> >    Besides the metadata interface design and the backend storage chosen,
> we
> > should also provide the random query ability . So users can calculate the
> > statistics like NDV to store in the metadata. Btw, maybe we can go
> further
> > to take in the Verdictdb  (https://github.com/mozafari/verdictdb) to
> > provide more richful approximate query processing .
> >
>

Reply via email to