Hi All,

Catching up on this old topic.

One of Drill's main differentiators is the ability to extend Drill with UDFs, 
custom storage and format plugins, custom security plugins, etc. I wonder if 
the team has considered taking a modular approach to metadata. Perhaps define a 
"metadata plugin" API. Then, allow implementations for the Hive Meta Store 
(HMS), for proprietary solutions (AtScale, Alation) or for simple ad-hoc use 
case (JSON schema files for specific collections of JSON data, or the existing 
Drill Parquet metadata files.)

Focusing on the API, rather than the implementation, would help Drill grow by 
allowing Drill to integrate with many different metadata systems.

Per Weijie's suggestion, I added the above as a comment to DRILL-6552, where 
I've also included a bit more detail based on the schema issues I've wrestled 
with in CSV and JSON, and in developing the "result set loader."

Thanks,
- Paul

 

    On Thursday, June 28, 2018, 8:30:52 PM PDT, weijie tong 
<tongweijie...@gmail.com> wrote:  
 
 Hi Vitalii:

  Glad to hear that you are also looking at this part. Let's  keep
discussion under that Jira.

On Fri, Jun 29, 2018 at 1:27 AM Vitalii Diravka <vitalii.dira...@gmail.com>
wrote:

> Hi Weijie,
>
> Thanks for bringing this topic up!
>
> Basically you are right, Hive Metastore is one the best candidates for
> storing Driil's metadata.
> Also it will be good to make an abstraction, which will allow to implement
> and use other kind of tools for Metastore.
> The question of Metastore performance can be important especially for light
> Drill tables.
>
> Currently Vova and I are working on the proposal for metastore.
> I have created Jira DRILL-6552 [1] where all the related discussions can be
> held.
>
> [1] https://issues.apache.org/jira/browse/DRILL-6552
>
> Kind regards
> Vitalii
>
>
> On Thu, Jun 28, 2018 at 6:49 PM Arina Yelchiyeva <
> arina.yelchiy...@gmail.com>
> wrote:
>
> > Hi,
> >
> > Vitalii and Vova is also looking at this part, you might want to sync up
> > with them. Or even better, we can create Jira for this and held all
> > discussions there.
> > Vitalii, what do you think?
> >
> > Kind regards,
> > Arina
> >
> > On Thu, Jun 28, 2018 at 6:46 PM weijie tong <tongweijie...@gmail.com>
> > wrote:
> >
> > > HI all:
> > >
> > >    As @aman ever noticed me about the roadmap of DRILL-2.0 ,which
> > includes
> > > the description of  the metadata design (
> > >
> > >
> >
> https://lists.apache.org/thread.html/74cf48dd78d323535dc942c969e72008884e51f8715f4a20f6f8fb66@%3Cdev.drill.apache.org%3E
> > > )
> > > , I am interested in taking the role to implement the metadata part.
> > > Here I fire this discussion thread to know your idea about this
> problem.
> > >
> > >    I have investigated some open source project about the metadata
> ,such
> > > as Hive Metastore (
> > >
> https://cwiki.apache.org/confluence/display/Hive/Design#Design-Metastore
> > )
> > > ,Netflix metacat, Apache Atlas,LinkedIn WhereHows(
> > > https://github.com/linkedin/WhereHows)  ;  Except Hive Metastore,
> other
> > > projects have an high abstract definition to the actual physical
> metadata
> > > which will benefit to extend to add new metadata property. Hive
> > Metastore‘s
> > > design is to the physical metadata , also with thrift interface to
> > > different languages, but depend on the relational database  not good to
> > > scale and performance.  To my opinion , I would prefer Hive Metastore
> as
> > > our design template or just reuse it, as we don't need to do a rich
> > > metadata management system. Maybe we should change the backend database
> > to
> > > a high query performance kv store like Hbase.
> > >
> > >    Besides the metadata interface design and the backend storage
> chosen,
> > we
> > > should also provide the random query ability . So users can calculate
> the
> > > statistics like NDV to store in the metadata. Btw, maybe we can go
> > further
> > > to take in the Verdictdb  (https://github.com/mozafari/verdictdb) to
> > > provide more richful approximate query processing .
> > >
> >
>  

Reply via email to