[
https://issues.apache.org/jira/browse/DRILL-7418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vova Vysotskyi updated DRILL-7418:
----------------------------------
Labels: ready-to-commit (was: )
> MetadataDirectGroupScan improvements
> ------------------------------------
>
> Key: DRILL-7418
> URL: https://issues.apache.org/jira/browse/DRILL-7418
> Project: Apache Drill
> Issue Type: Improvement
> Affects Versions: 1.16.0
> Reporter: Arina Ielchiieva
> Assignee: Arina Ielchiieva
> Priority: Minor
> Labels: ready-to-commit
> Fix For: 1.17.0
>
>
> When count is converted to direct scan (case when statistics or table
> metadata are available and there is no need to perform count operation),
> {{MetadataDirectGroupScan}} is used. Proposed {{MetadataDirectGroupScan}}
> enhancements:
> 1. Show table selection root instead listing all table files. If table has
> lots of files, query plan gets polluted with all files enumeration. Since
> files are not used for calculation (only metadata), they are not relevant and
> can be excluded from the plan.
> Before:
> {noformat}
> | 00-00 Screen
> 00-01 Project(EXPR$0=[$0], EXPR$1=[$1], EXPR$2=[$2], EXPR$3=[$3])
> 00-02 DirectScan(groupscan=[files =
> [/drill/testdata/metadata_cache/store_sales_null_blocks_all/0_0_0.parquet,
> /drill/testdata/metadata_cache/store_sales_null_blocks_all/0_0_5.parquet,
> /drill/testdata/metadata_cache/store_sales_null_blocks_all/0_0_4.parquet,
> /drill/testdata/metadata_cache/store_sales_null_blocks_all/0_0_9.parquet,
> /drill/testdata/metadata_cache/store_sales_null_blocks_all/0_0_3.parquet,
> /drill/testdata/metadata_cache/store_sales_null_blocks_all/0_0_6.parquet,
> /drill/testdata/metadata_cache/store_sales_null_blocks_all/0_0_7.parquet,
> /drill/testdata/metadata_cache/store_sales_null_blocks_all/0_0_10.parquet,
> /drill/testdata/metadata_cache/store_sales_null_blocks_all/0_0_2.parquet,
> /drill/testdata/metadata_cache/store_sales_null_blocks_all/0_0_1.parquet,
> /drill/testdata/metadata_cache/store_sales_null_blocks_all/0_0_8.parquet],
> numFiles = 11, usedMetadataSummaryFile = false,
> DynamicPojoRecordReader{records = [[1560060, 2880404, 2880404, 0]]}])
> {noformat}
> After:
> {noformat}
> | 00-00 Screen
> 00-01 Project(EXPR$0=[$0], EXPR$1=[$1], EXPR$2=[$2], EXPR$3=[$3])
> 00-02 DirectScan(groupscan=[selectionRoot =
> /drill/testdata/metadata_cache/store_sales_null_blocks_all, numFiles = 11,
> usedMetadataSummaryFile = false, DynamicPojoRecordReader{records = [[1560060,
> 2880404, 2880404, 0]]}])
> {noformat}
> For Hive tables which were scanned directly, selection root is not available
> thus will be omitted.
> 2. Submission of physical plan which contains {{MetadataDirectGroupScan}}
> fails with deserialization errors, proper ser / de should be implemented.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)