Khurram Faraaz created DRILL-3538:
-------------------------------------

             Summary: We do not prune partitions when we count over 
partitioning key and filter over partitioning key
                 Key: DRILL-3538
                 URL: https://issues.apache.org/jira/browse/DRILL-3538
             Project: Apache Drill
          Issue Type: Bug
          Components: Execution - Flow
    Affects Versions: 1.2.0
         Environment: 4 node cluster on CentOS
            Reporter: Khurram Faraaz
            Assignee: Chris Westin


We are not partition pruning when we do a count over partitioning key and when 
the predicate involves the partitioning key. CTAS used was,

{code}
create table t3214 partition by (key2) as select cast(key1 as double) key1, 
cast(key2 as char(1)) key2 from `twoKeyJsn.json`;
{code}

case 1) We do not do partition pruning in this case.

{code}
0: jdbc:drill:schema=dfs.tmp> explain plan for select count(key2) from t3214 
where key2 = 'm';
+------+------+
| text | json |
+------+------+
| 00-00    Screen
00-01      Project(EXPR$0=[$0])
00-02        Project(EXPR$0=[$0])
00-03          
Scan(groupscan=[org.apache.drill.exec.store.pojo.PojoRecordReader@e2471d7])
{code}

case 2) We do not do partition pruning in this case.

{code}
0: jdbc:drill:schema=dfs.tmp> explain plan for select count(*) from t3214 where 
key2 = 'm';
+------+------+
| text | json |
+------+------+
| 00-00    Screen
00-01      Project(EXPR$0=[$0])
00-02        Project(EXPR$0=[$0])
00-03          
Scan(groupscan=[org.apache.drill.exec.store.pojo.PojoRecordReader@211930a2])
{code}

case 3) We do not do partition pruning in this case.

{code}
0: jdbc:drill:schema=dfs.tmp> explain plan for select count(key1) from t3214 
where key2 = 'm';
+------+------+
| text | json |
+------+------+
| 00-00    Screen
00-01      Project(EXPR$0=[$0])
00-02        Project(EXPR$0=[$0])
00-03          
Scan(groupscan=[org.apache.drill.exec.store.pojo.PojoRecordReader@23fea3b0])
{code}

case 4) we do prune here.

{code}
0: jdbc:drill:schema=dfs.tmp> explain plan for select avg(key1) from t3214 
where key2 = 'm';
+------+------+
| text | json |
+------+------+
| 00-00    Screen
00-01      Project(EXPR$0=[CAST(/(CastHigh(CASE(=($1, 0), null, $0)), $1)):ANY 
NOT NULL])
00-02        StreamAgg(group=[{}], agg#0=[$SUM0($0)], agg#1=[$SUM0($1)])
00-03          StreamAgg(group=[{}], agg#0=[$SUM0($0)], agg#1=[COUNT($0)])
00-04            Project(key1=[$1])
00-05              Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
[path=/tmp/t3214/0_0_15.parquet]], selectionRoot=maprfs:/tmp/t3214, numFiles=1, 
columns=[`key2`, `key1`]]])
{code}

case 5) we do prune here.

{code}
0: jdbc:drill:schema=dfs.tmp> explain plan for select min(key1) from t3214 
where key2 = 'm';
+------+------+
| text | json |
+------+------+
| 00-00    Screen
00-01      Project(EXPR$0=[$0])
00-02        StreamAgg(group=[{}], EXPR$0=[MIN($0)])
00-03          StreamAgg(group=[{}], EXPR$0=[MIN($0)])
00-04            Project(key1=[$1])
00-05              Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
[path=/tmp/t3214/0_0_15.parquet]], selectionRoot=maprfs:/tmp/t3214, numFiles=1, 
columns=[`key2`, `key1`]]])
{code}

commit id that I am testing on : 17e580a7



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to