[ 
https://issues.apache.org/jira/browse/DRILL-6552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16579742#comment-16579742
 ] 

Volodymyr Vysotskyi commented on DRILL-6552:
--------------------------------------------

[~paul-rogers], thanks for participating in this topic.
{quote}Is this under active development? Is a design document available?
{quote}
Currently, this task is at the active investigation stage. I and [~vitalii] 
have some ideas regarding handling metadata and column statistics, how it may 
be fit into existing flow and how to solve problems which appear with it. There 
is no design doc yet, but I think before the next hangout (Aug, 21) we will 
prepare a small presentation with a short description of our ideas.
{quote}I wonder if Drill could build on its storage plugin, format plugin and 
UDF model to allow metadata to be seen as an extension, customized for various 
environments.
{quote}
For now, we are focused on using metastore for storing table metadata and 
column statistics. I think, exposing collected metadata for various 
environments depends on the underlying store for metadata. In the case of HMS, 
it is possible.
{quote}And, we have often wished for a simpler, per-file (or pre-table) system 
that does not need the complexity and overhead of the HMS.
{quote}
Choosing HMS as an underlying store for metadata is an initial stage. In this 
case, we won't be forced to solve the same problems which were already 
considered in HMS. For example, existing Drills' Parquet metadata system is 
weak for concurrent modifications and reading.
The next steps will be providing other implementations for metastore using 
previously defined API.
{quote}Plan-time metadata might include schema and statistics.
{quote}
Yes, we will be able to make the same optimizations for other formats as for 
parquet and even more. Besides that, a problem with a large number of metadata 
files will be solved.
{quote}This, in turn, could allow Drill to generate and compile code at plan 
time, then distribute it to the workers, saving the cost of generating the (now 
identical) code in each of dozens of minor fragments.
{quote}
Thanks for this idea, looks amazing!

> Drill Metadata management "Drill MetaStore"
> -------------------------------------------
>
>                 Key: DRILL-6552
>                 URL: https://issues.apache.org/jira/browse/DRILL-6552
>             Project: Apache Drill
>          Issue Type: New Feature
>          Components: Metadata
>    Affects Versions: 1.13.0
>            Reporter: Vitalii Diravka
>            Assignee: Vitalii Diravka
>            Priority: Major
>             Fix For: 2.0.0
>
>
> It would be useful for Drill to have some sort of metastore which would 
> enable Drill to remember previously defined schemata so Drill doesn’t have to 
> do the same work over and over again.
> It allows to store schema and statistics, which will allow to accelerate 
> queries validation, planning and execution time. Also it increases stability 
> of Drill and allows to avoid different kind if issues: "schema change 
> Exceptions", "limit 0" optimization and so on. 
> One of the main candidates is Hive Metastore.
> Starting from 3.0 version Hive Metastore can be the separate service from 
> Hive server:
> [https://cwiki.apache.org/confluence/display/Hive/AdminManual+Metastore+3.0+Administration]
> Optional enhancement is storing Drill's profiles, UDFs, plugins configs in 
> some kind of metastore as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to