[
https://issues.apache.org/jira/browse/DRILL-6552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16579742#comment-16579742
]
Volodymyr Vysotskyi commented on DRILL-6552:
--------------------------------------------
[~paul-rogers], thanks for participating in this topic.
{quote}Is this under active development? Is a design document available?
{quote}
Currently, this task is at the active investigation stage. I and [~vitalii]
have some ideas regarding handling metadata and column statistics, how it may
be fit into existing flow and how to solve problems which appear with it. There
is no design doc yet, but I think before the next hangout (Aug, 21) we will
prepare a small presentation with a short description of our ideas.
{quote}I wonder if Drill could build on its storage plugin, format plugin and
UDF model to allow metadata to be seen as an extension, customized for various
environments.
{quote}
For now, we are focused on using metastore for storing table metadata and
column statistics. I think, exposing collected metadata for various
environments depends on the underlying store for metadata. In the case of HMS,
it is possible.
{quote}And, we have often wished for a simpler, per-file (or pre-table) system
that does not need the complexity and overhead of the HMS.
{quote}
Choosing HMS as an underlying store for metadata is an initial stage. In this
case, we won't be forced to solve the same problems which were already
considered in HMS. For example, existing Drills' Parquet metadata system is
weak for concurrent modifications and reading.
The next steps will be providing other implementations for metastore using
previously defined API.
{quote}Plan-time metadata might include schema and statistics.
{quote}
Yes, we will be able to make the same optimizations for other formats as for
parquet and even more. Besides that, a problem with a large number of metadata
files will be solved.
{quote}This, in turn, could allow Drill to generate and compile code at plan
time, then distribute it to the workers, saving the cost of generating the (now
identical) code in each of dozens of minor fragments.
{quote}
Thanks for this idea, looks amazing!
> Drill Metadata management "Drill MetaStore"
> -------------------------------------------
>
> Key: DRILL-6552
> URL: https://issues.apache.org/jira/browse/DRILL-6552
> Project: Apache Drill
> Issue Type: New Feature
> Components: Metadata
> Affects Versions: 1.13.0
> Reporter: Vitalii Diravka
> Assignee: Vitalii Diravka
> Priority: Major
> Fix For: 2.0.0
>
>
> It would be useful for Drill to have some sort of metastore which would
> enable Drill to remember previously defined schemata so Drill doesn’t have to
> do the same work over and over again.
> It allows to store schema and statistics, which will allow to accelerate
> queries validation, planning and execution time. Also it increases stability
> of Drill and allows to avoid different kind if issues: "schema change
> Exceptions", "limit 0" optimization and so on.
> One of the main candidates is Hive Metastore.
> Starting from 3.0 version Hive Metastore can be the separate service from
> Hive server:
> [https://cwiki.apache.org/confluence/display/Hive/AdminManual+Metastore+3.0+Administration]
> Optional enhancement is storing Drill's profiles, UDFs, plugins configs in
> some kind of metastore as well.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)