[ 
https://issues.apache.org/jira/browse/DRILL-6552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16621266#comment-16621266
 ] 

Gautam Kumar Parai commented on DRILL-6552:
-------------------------------------------

I would like to mention that two-phase aggregation along with custom operators 
for computing statistics (instead of e.g. count(*)) was done as part of 
DRILL-1328 similar to the approach suggested by [~okalinin]. However, the perf 
numbers were nowhere near earth-shattering :(

The future improvements were identified as either have a multi-phase agg 
approach OR use sampling in order to speed it up further. Another option would 
be to re-visit the code to see if we can speed up the existing implementation 
further. [~paul-rogers] had reviewed the code at the time - he is certainly a 
ton more versed with execution efficiency than I am. Any suggestions Paul and 
others?

[~vitalii] in addition to the metadata-at-scale problem we should also consider 
the functional completeness. For performance benchmarks like TPC-H/TPCH-DS, we 
had identified histograms as critical for improving planning. Last time when 
you and [~vvysotskyi] had presented the proposal, it seemed like another 
limitation of HMS would be the inability to store histograms. Do you have a 
proposal or workaround for handling histograms - or is it not feasible at all?

> Drill Metadata management "Drill MetaStore"
> -------------------------------------------
>
>                 Key: DRILL-6552
>                 URL: https://issues.apache.org/jira/browse/DRILL-6552
>             Project: Apache Drill
>          Issue Type: New Feature
>          Components: Metadata
>    Affects Versions: 1.13.0
>            Reporter: Vitalii Diravka
>            Assignee: Vitalii Diravka
>            Priority: Major
>             Fix For: 2.0.0
>
>
> It would be useful for Drill to have some sort of metastore which would 
> enable Drill to remember previously defined schemata so Drill doesn’t have to 
> do the same work over and over again.
> It allows to store schema and statistics, which will allow to accelerate 
> queries validation, planning and execution time. Also it increases stability 
> of Drill and allows to avoid different kind if issues: "schema change 
> Exceptions", "limit 0" optimization and so on. 
> One of the main candidates is Hive Metastore.
> Starting from 3.0 version Hive Metastore can be the separate service from 
> Hive server:
> [https://cwiki.apache.org/confluence/display/Hive/AdminManual+Metastore+3.0+Administration]
> Optional enhancement is storing Drill's profiles, UDFs, plugins configs in 
> some kind of metastore as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to