[ 
https://issues.apache.org/jira/browse/DRILL-7418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vova Vysotskyi updated DRILL-7418:
----------------------------------
    Labels: ready-to-commit  (was: )

> MetadataDirectGroupScan improvements
> ------------------------------------
>
>                 Key: DRILL-7418
>                 URL: https://issues.apache.org/jira/browse/DRILL-7418
>             Project: Apache Drill
>          Issue Type: Improvement
>    Affects Versions: 1.16.0
>            Reporter: Arina Ielchiieva
>            Assignee: Arina Ielchiieva
>            Priority: Minor
>              Labels: ready-to-commit
>             Fix For: 1.17.0
>
>
> When count is converted to direct scan (case when statistics or table 
> metadata are available and there is no need to perform count operation), 
> {{MetadataDirectGroupScan}} is used. Proposed {{MetadataDirectGroupScan}} 
> enhancements:
> 1. Show table selection root instead listing all table files. If table has 
> lots of files, query plan gets polluted with all files enumeration. Since 
> files are not used for calculation (only metadata), they are not relevant and 
> can be excluded from the plan.
> Before:
> {noformat}
> | 00-00    Screen
> 00-01      Project(EXPR$0=[$0], EXPR$1=[$1], EXPR$2=[$2], EXPR$3=[$3])
> 00-02        DirectScan(groupscan=[files = 
> [/drill/testdata/metadata_cache/store_sales_null_blocks_all/0_0_0.parquet, 
> /drill/testdata/metadata_cache/store_sales_null_blocks_all/0_0_5.parquet, 
> /drill/testdata/metadata_cache/store_sales_null_blocks_all/0_0_4.parquet, 
> /drill/testdata/metadata_cache/store_sales_null_blocks_all/0_0_9.parquet, 
> /drill/testdata/metadata_cache/store_sales_null_blocks_all/0_0_3.parquet, 
> /drill/testdata/metadata_cache/store_sales_null_blocks_all/0_0_6.parquet, 
> /drill/testdata/metadata_cache/store_sales_null_blocks_all/0_0_7.parquet, 
> /drill/testdata/metadata_cache/store_sales_null_blocks_all/0_0_10.parquet, 
> /drill/testdata/metadata_cache/store_sales_null_blocks_all/0_0_2.parquet, 
> /drill/testdata/metadata_cache/store_sales_null_blocks_all/0_0_1.parquet, 
> /drill/testdata/metadata_cache/store_sales_null_blocks_all/0_0_8.parquet], 
> numFiles = 11, usedMetadataSummaryFile = false, 
> DynamicPojoRecordReader{records = [[1560060, 2880404, 2880404, 0]]}])
> {noformat}
> After:
> {noformat}
> | 00-00    Screen
> 00-01      Project(EXPR$0=[$0], EXPR$1=[$1], EXPR$2=[$2], EXPR$3=[$3])
> 00-02        DirectScan(groupscan=[selectionRoot = 
> /drill/testdata/metadata_cache/store_sales_null_blocks_all, numFiles = 11, 
> usedMetadataSummaryFile = false, DynamicPojoRecordReader{records = [[1560060, 
> 2880404, 2880404, 0]]}])
> {noformat}
> For Hive tables which were scanned directly, selection root is not available 
> thus will be omitted.
> 2. Submission of physical plan which contains {{MetadataDirectGroupScan}} 
> fails with deserialization errors, proper ser / de should be implemented.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to