[ 
https://issues.apache.org/jira/browse/DRILL-6552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16612437#comment-16612437
 ] 

Vitalii Diravka commented on DRILL-6552:
----------------------------------------

[~weijie] [~paul-rogers]
We had a discussion in Hagouts and presented some slides there: 
https://docs.google.com/presentation/d/1m8Hxnwv3PtgIDfNsptCWwA_UYpTq_t3yyvj4P-7NuFs/edit#slide=id.p
Currently, we are working on design doc for Drill Metastore. 
The general ideas are: 
* define Drill Metastore API (it will allow adding new implementations in 
future, like HBase/MapR-DB etc)
* accommodate current Parquet Metadata cache files to it (also it will allow 
easier create the similar metadata cache files for other storage formats)
* add implementation for usage of HMS in Drill Metastore
* implement collecting metadata for different storages by leveraging custom 
operators from DRILL-1328 Gautam's work
* implement custom JSON schema reader for exploring the JSON schema.

Possible solutions for some HMS limitations: 
* HMS can store only table and partitions metadata, so parquet, json, csv... 
files schema/statistics will be stored as partitions metadata. 
* HMS can store only some specific kinds of column statistics, so some other 
ways are considered how to store all available Drill statistics, for instance 
to store this data as table/partition properties or contribute to Hive 
Metastore to expand it.


> Drill Metadata management "Drill MetaStore"
> -------------------------------------------
>
>                 Key: DRILL-6552
>                 URL: https://issues.apache.org/jira/browse/DRILL-6552
>             Project: Apache Drill
>          Issue Type: New Feature
>          Components: Metadata
>    Affects Versions: 1.13.0
>            Reporter: Vitalii Diravka
>            Assignee: Vitalii Diravka
>            Priority: Major
>             Fix For: 2.0.0
>
>
> It would be useful for Drill to have some sort of metastore which would 
> enable Drill to remember previously defined schemata so Drill doesn’t have to 
> do the same work over and over again.
> It allows to store schema and statistics, which will allow to accelerate 
> queries validation, planning and execution time. Also it increases stability 
> of Drill and allows to avoid different kind if issues: "schema change 
> Exceptions", "limit 0" optimization and so on. 
> One of the main candidates is Hive Metastore.
> Starting from 3.0 version Hive Metastore can be the separate service from 
> Hive server:
> [https://cwiki.apache.org/confluence/display/Hive/AdminManual+Metastore+3.0+Administration]
> Optional enhancement is storing Drill's profiles, UDFs, plugins configs in 
> some kind of metastore as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to