[
https://issues.apache.org/jira/browse/DRILL-6552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16621266#comment-16621266
]
Gautam Kumar Parai commented on DRILL-6552:
-------------------------------------------
I would like to mention that two-phase aggregation along with custom operators
for computing statistics (instead of e.g. count(*)) was done as part of
DRILL-1328 similar to the approach suggested by [~okalinin]. However, the perf
numbers were nowhere near earth-shattering :(
The future improvements were identified as either have a multi-phase agg
approach OR use sampling in order to speed it up further. Another option would
be to re-visit the code to see if we can speed up the existing implementation
further. [~paul-rogers] had reviewed the code at the time - he is certainly a
ton more versed with execution efficiency than I am. Any suggestions Paul and
others?
[~vitalii] in addition to the metadata-at-scale problem we should also consider
the functional completeness. For performance benchmarks like TPC-H/TPCH-DS, we
had identified histograms as critical for improving planning. Last time when
you and [~vvysotskyi] had presented the proposal, it seemed like another
limitation of HMS would be the inability to store histograms. Do you have a
proposal or workaround for handling histograms - or is it not feasible at all?
> Drill Metadata management "Drill MetaStore"
> -------------------------------------------
>
> Key: DRILL-6552
> URL: https://issues.apache.org/jira/browse/DRILL-6552
> Project: Apache Drill
> Issue Type: New Feature
> Components: Metadata
> Affects Versions: 1.13.0
> Reporter: Vitalii Diravka
> Assignee: Vitalii Diravka
> Priority: Major
> Fix For: 2.0.0
>
>
> It would be useful for Drill to have some sort of metastore which would
> enable Drill to remember previously defined schemata so Drill doesn’t have to
> do the same work over and over again.
> It allows to store schema and statistics, which will allow to accelerate
> queries validation, planning and execution time. Also it increases stability
> of Drill and allows to avoid different kind if issues: "schema change
> Exceptions", "limit 0" optimization and so on.
> One of the main candidates is Hive Metastore.
> Starting from 3.0 version Hive Metastore can be the separate service from
> Hive server:
> [https://cwiki.apache.org/confluence/display/Hive/AdminManual+Metastore+3.0+Administration]
> Optional enhancement is storing Drill's profiles, UDFs, plugins configs in
> some kind of metastore as well.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)