[
https://issues.apache.org/jira/browse/DRILL-7064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Volodymyr Vysotskyi updated DRILL-7064:
---------------------------------------
Labels: ready-to-commit (was: )
> Leverage the summary's totalRowCount and totalNullCount for COUNT() queries
> (also prevent eager expansion of files)
> -------------------------------------------------------------------------------------------------------------------
>
> Key: DRILL-7064
> URL: https://issues.apache.org/jira/browse/DRILL-7064
> Project: Apache Drill
> Issue Type: Sub-task
> Components: Metadata
> Reporter: Venkata Jyothsna Donapati
> Assignee: Aman Sinha
> Priority: Major
> Labels: ready-to-commit
> Fix For: 1.16.0
>
> Original Estimate: 336h
> Remaining Estimate: 336h
>
> This sub-task is meant to leverage the Parquet metadata cache's summary
> stats: totalRowCount (across all files and row groups) and the per-column
> totalNullCount (across all files and row groups) to answer plain COUNT
> aggregation queries without Group-By. These are currently converted to a
> DirectScan by the ConvertCountToDirectScanRule which utilizes the row group
> metadata; however this rule is applied on Drill Logical rels and converts the
> logical plan to a physical plan with DirectScanPrel but this is too late
> since the DrillScanRel that is already created during logical planning has
> already read the entire metadata cache file along with its full list of row
> group entries. The metadata cache file can grow quite large and this does not
> scale.
> The solution is to use the Metadata Summary file that is created in
> DRILL-7063 and create a new rule that will apply early on such that it
> operates on the Calcite logical rels instead of the Drill logical rels and
> prevents eager expansion of the list of files/row groups.
> We will not remove the existing rule. The existing rule will continue to
> operate as before because it is possible that after some transformations, we
> still want to apply the optimizations for COUNT queries.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)