[ 
https://issues.apache.org/jira/browse/DRILL-7418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-7418:
------------------------------------
    Description: 
When count is converted to direct scan (case when statistics or table metadata 
are available and there is no need to perform count operation), 
{{MetadataDirectGroupScan}} is used. Proposed {{MetadataDirectGroupScan}} 
enhancements:
1. Show table selection root instead listing all table files. If table has lots 
of files, query plan gets polluted with all files enumeration. Since files are 
not used for calculation (only metadata), they are not relevant and can be 
excluded from the plan.

Before:
{noformat}
| 00-00    Screen
00-01      Project(EXPR$0=[$0], EXPR$1=[$1], EXPR$2=[$2], EXPR$3=[$3])
00-02        DirectScan(groupscan=[files = 
[/drill/testdata/metadata_cache/store_sales_null_blocks_all/0_0_0.parquet, 
/drill/testdata/metadata_cache/store_sales_null_blocks_all/0_0_5.parquet, 
/drill/testdata/metadata_cache/store_sales_null_blocks_all/0_0_4.parquet, 
/drill/testdata/metadata_cache/store_sales_null_blocks_all/0_0_9.parquet, 
/drill/testdata/metadata_cache/store_sales_null_blocks_all/0_0_3.parquet, 
/drill/testdata/metadata_cache/store_sales_null_blocks_all/0_0_6.parquet, 
/drill/testdata/metadata_cache/store_sales_null_blocks_all/0_0_7.parquet, 
/drill/testdata/metadata_cache/store_sales_null_blocks_all/0_0_10.parquet, 
/drill/testdata/metadata_cache/store_sales_null_blocks_all/0_0_2.parquet, 
/drill/testdata/metadata_cache/store_sales_null_blocks_all/0_0_1.parquet, 
/drill/testdata/metadata_cache/store_sales_null_blocks_all/0_0_8.parquet], 
numFiles = 11, usedMetadataSummaryFile = false, DynamicPojoRecordReader{records 
= [[1560060, 2880404, 2880404, 0]]}])
{noformat}


After:
{noformat}
| 00-00    Screen
00-01      Project(EXPR$0=[$0], EXPR$1=[$1], EXPR$2=[$2], EXPR$3=[$3])
00-02        DirectScan(groupscan=[selectionRoot = 
/drill/testdata/metadata_cache/store_sales_null_blocks_all, numFiles = 11, 
usedMetadataSummaryFile = false, DynamicPojoRecordReader{records = [[1560060, 
2880404, 2880404, 0]]}])
{noformat}

For Hive tables which were scanned directly, selection root is not available 
thus will be omitted.

2. Submission of physical plan which contains {{MetadataDirectGroupScan}} fails 
with deserialization errors, proper ser / de should be implemented.

  was:
When count is converted to direct scan (case when statistics or table metadata 
are available and there is no need to perform count operation), 
{{MetadataDirectGroupScan}} is used. Proposed {{MetadataDirectGroupScan}} 
enhancements:
1. Show table selection root instead listing all table files. If table has lots 
of files, query plan gets polluted with all files enumeration. Since files are 
not used for calculation (only metadata), they are not relevant and can be 
excluded from the plan.

Before:
{noformat}
| 00-00    Screen
00-01      Project(EXPR$0=[$0], EXPR$1=[$1], EXPR$2=[$2], EXPR$3=[$3])
00-02        DirectScan(groupscan=[files = 
[/drill/testdata/metadata_cache/store_sales_null_blocks_all/0_0_0.parquet, 
/drill/testdata/metadata_cache/store_sales_null_blocks_all/0_0_5.parquet, 
/drill/testdata/metadata_cache/store_sales_null_blocks_all/0_0_4.parquet, 
/drill/testdata/metadata_cache/store_sales_null_blocks_all/0_0_9.parquet, 
/drill/testdata/metadata_cache/store_sales_null_blocks_all/0_0_3.parquet, 
/drill/testdata/metadata_cache/store_sales_null_blocks_all/0_0_6.parquet, 
/drill/testdata/metadata_cache/store_sales_null_blocks_all/0_0_7.parquet, 
/drill/testdata/metadata_cache/store_sales_null_blocks_all/0_0_10.parquet, 
/drill/testdata/metadata_cache/store_sales_null_blocks_all/0_0_2.parquet, 
/drill/testdata/metadata_cache/store_sales_null_blocks_all/0_0_1.parquet, 
/drill/testdata/metadata_cache/store_sales_null_blocks_all/0_0_8.parquet], 
numFiles = 11, usedMetadataSummaryFile = false, DynamicPojoRecordReader{records 
= [[1560060, 2880404, 2880404, 0]]}])
{noformat}


After:
{noformat}
| 00-00    Screen
00-01      Project(EXPR$0=[$0], EXPR$1=[$1], EXPR$2=[$2], EXPR$3=[$3])
00-02        DirectScan(groupscan=[selectionRoot = 
/drill/testdata/metadata_cache/store_sales_null_blocks_all, numFiles = 11, 
usedMetadataSummaryFile = false, DynamicPojoRecordReader{records = [[1560060, 
2880404, 2880404, 0]]}])
{noformat}

2. Submission of physical plan which contains {{MetadataDirectGroupScan}} fails 
with deserialization errors, proper ser / de should be implemented.


> MetadataDirectGroupScan improvements
> ------------------------------------
>
>                 Key: DRILL-7418
>                 URL: https://issues.apache.org/jira/browse/DRILL-7418
>             Project: Apache Drill
>          Issue Type: Improvement
>    Affects Versions: 1.16.0
>            Reporter: Arina Ielchiieva
>            Assignee: Arina Ielchiieva
>            Priority: Minor
>             Fix For: 1.17.0
>
>
> When count is converted to direct scan (case when statistics or table 
> metadata are available and there is no need to perform count operation), 
> {{MetadataDirectGroupScan}} is used. Proposed {{MetadataDirectGroupScan}} 
> enhancements:
> 1. Show table selection root instead listing all table files. If table has 
> lots of files, query plan gets polluted with all files enumeration. Since 
> files are not used for calculation (only metadata), they are not relevant and 
> can be excluded from the plan.
> Before:
> {noformat}
> | 00-00    Screen
> 00-01      Project(EXPR$0=[$0], EXPR$1=[$1], EXPR$2=[$2], EXPR$3=[$3])
> 00-02        DirectScan(groupscan=[files = 
> [/drill/testdata/metadata_cache/store_sales_null_blocks_all/0_0_0.parquet, 
> /drill/testdata/metadata_cache/store_sales_null_blocks_all/0_0_5.parquet, 
> /drill/testdata/metadata_cache/store_sales_null_blocks_all/0_0_4.parquet, 
> /drill/testdata/metadata_cache/store_sales_null_blocks_all/0_0_9.parquet, 
> /drill/testdata/metadata_cache/store_sales_null_blocks_all/0_0_3.parquet, 
> /drill/testdata/metadata_cache/store_sales_null_blocks_all/0_0_6.parquet, 
> /drill/testdata/metadata_cache/store_sales_null_blocks_all/0_0_7.parquet, 
> /drill/testdata/metadata_cache/store_sales_null_blocks_all/0_0_10.parquet, 
> /drill/testdata/metadata_cache/store_sales_null_blocks_all/0_0_2.parquet, 
> /drill/testdata/metadata_cache/store_sales_null_blocks_all/0_0_1.parquet, 
> /drill/testdata/metadata_cache/store_sales_null_blocks_all/0_0_8.parquet], 
> numFiles = 11, usedMetadataSummaryFile = false, 
> DynamicPojoRecordReader{records = [[1560060, 2880404, 2880404, 0]]}])
> {noformat}
> After:
> {noformat}
> | 00-00    Screen
> 00-01      Project(EXPR$0=[$0], EXPR$1=[$1], EXPR$2=[$2], EXPR$3=[$3])
> 00-02        DirectScan(groupscan=[selectionRoot = 
> /drill/testdata/metadata_cache/store_sales_null_blocks_all, numFiles = 11, 
> usedMetadataSummaryFile = false, DynamicPojoRecordReader{records = [[1560060, 
> 2880404, 2880404, 0]]}])
> {noformat}
> For Hive tables which were scanned directly, selection root is not available 
> thus will be omitted.
> 2. Submission of physical plan which contains {{MetadataDirectGroupScan}} 
> fails with deserialization errors, proper ser / de should be implemented.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to