[jira] [Commented] (DRILL-6199) Filter push down doesn't work with more than one nested subqueries

Anton Gozhiy (JIRA) Thu, 01 Mar 2018 05:42:38 -0800

    [ 
https://issues.apache.org/jira/browse/DRILL-6199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16382002#comment-16382002
 ]


Anton Gozhiy commented on DRILL-6199:
-------------------------------------

Additional cases where this issue is reproduced:
*Partition pruning:*
- *Data:*
{code:sql}
create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_files` (c1, c2, c3, c4, 
c5) partition by (c1) as select cast(columns[0] as int) c1, columns[1] c2, 
columns[2] c3, columns[3] c4, columns[4] c5 from 
dfs.tmp.`DRILL_6118_data_source.csv`;
{code}
- *Query:*
{code:sql}
explain plan for select * from (select * from (select * from 
dfs.tmp.`DRILL_6118_parquet_partitioned_by_files`)) where c1 between 2 and 4
{code}
- *Expected result:*
numFiles=3, numRowGroups=3 (scanning 3 partitions)

- *Actual result:*
numFiles=1, numRowGroups=5 (scanning all partitions)

*Directory pruning:*
- *Query:*
{code:sql}
explain plan for select * from (select * from (select * from 
dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders`)) where dir0='d2'
{code}

- *Expected result:*
numFiles=1, numRowGroups=1

- *Actual result:*
numFiles=3, numRowGroups=3

> Filter push down doesn't work with more than one nested subqueries
> ------------------------------------------------------------------
>
>                 Key: DRILL-6199
>                 URL: https://issues.apache.org/jira/browse/DRILL-6199
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.13.0
>            Reporter: Anton Gozhiy
>            Assignee: Arina Ielchiieva
>            Priority: Major
>             Fix For: 1.13.0
>
>         Attachments: DRILL_6118_data_source.csv
>
>
> *Data set:*
> The data is generated used the attached file: *DRILL_6118_data_source.csv*
> Data gen commands:
> {code:sql}
> create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d1` (c1, c2, 
> c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] 
> c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` 
> where columns[0] in (1, 3);
> create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d2` (c1, c2, 
> c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] 
> c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` 
> where columns[0]=2;
> create table dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders/d3` (c1, c2, 
> c3, c4, c5) as select cast(columns[0] as int) c1, columns[1] c2, columns[2] 
> c3, columns[3] c4, columns[4] c5 from dfs.tmp.`DRILL_6118_data_source.csv` 
> where columns[0]>3;
> {code}
> *Steps:*
> # Execute the following query:
> {code:sql}
> explain plan for select * from (select * from (select * from 
> dfs.tmp.`DRILL_6118_parquet_partitioned_by_folders`)) where c1<3
> {code}
> *Expected result:*
> numFiles=2, numRowGroups=2, only files from the folders d1 and d2 should be 
> scanned.
> *Actual result:*
> Filter push down doesn't work:
> numFiles=3, numRowGroups=3, scanning from all files



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6199) Filter push down doesn't work with more than one nested subqueries

Reply via email to