[jira] [Commented] (DRILL-3538) We do not prune partitions when we count over partitioning key and filter over partitioning key

Aman Sinha (JIRA) Sun, 01 Nov 2015 10:33:06 -0800

    [ 
https://issues.apache.org/jira/browse/DRILL-3538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14984483#comment-14984483
 ]


Aman Sinha commented on DRILL-3538:
-----------------------------------

[~khfaraaz] I am not sure why you say we are not pruning in cases 1, 2, 3.  The 
Explain looks fine to me.  There is no Filter node in the plan which indicates 
it has been pushed into the Scan.  The reason you see the Scan showing a 
PojoRecordReader is that for a trivial COUNT(*) query on Parquet data, Drill 
optimizes by reading the row count directly from the metadata instead of doing 
it through a separate aggregation.  If you are specifically looking for the 
Scan to display the attributes it displays for a regular scan, that's a 
separate issue. 

> We do not prune partitions when we count over partitioning key and filter 
> over partitioning key
> -----------------------------------------------------------------------------------------------
>
>                 Key: DRILL-3538
>                 URL: https://issues.apache.org/jira/browse/DRILL-3538
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Flow
>    Affects Versions: 1.2.0
>         Environment: 4 node cluster on CentOS
>            Reporter: Khurram Faraaz
>            Assignee: Aman Sinha
>            Priority: Critical
>             Fix For: 1.3.0
>
>
> We are not partition pruning when we do a count over partitioning key and 
> when the predicate involves the partitioning key. CTAS used was,
> {code}
> create table t3214 partition by (key2) as select cast(key1 as double) key1, 
> cast(key2 as char(1)) key2 from `twoKeyJsn.json`;
> {code}
> case 1) We do not do partition pruning in this case.
> {code}
> 0: jdbc:drill:schema=dfs.tmp> explain plan for select count(key2) from t3214 
> where key2 = 'm';
> +------+------+
> | text | json |
> +------+------+
> | 00-00    Screen
> 00-01      Project(EXPR$0=[$0])
> 00-02        Project(EXPR$0=[$0])
> 00-03          
> Scan(groupscan=[org.apache.drill.exec.store.pojo.PojoRecordReader@e2471d7])
> {code}
> case 2) We do not do partition pruning in this case.
> {code}
> 0: jdbc:drill:schema=dfs.tmp> explain plan for select count(*) from t3214 
> where key2 = 'm';
> +------+------+
> | text | json |
> +------+------+
> | 00-00    Screen
> 00-01      Project(EXPR$0=[$0])
> 00-02        Project(EXPR$0=[$0])
> 00-03          
> Scan(groupscan=[org.apache.drill.exec.store.pojo.PojoRecordReader@211930a2])
> {code}
> case 3) We do not do partition pruning in this case.
> {code}
> 0: jdbc:drill:schema=dfs.tmp> explain plan for select count(key1) from t3214 
> where key2 = 'm';
> +------+------+
> | text | json |
> +------+------+
> | 00-00    Screen
> 00-01      Project(EXPR$0=[$0])
> 00-02        Project(EXPR$0=[$0])
> 00-03          
> Scan(groupscan=[org.apache.drill.exec.store.pojo.PojoRecordReader@23fea3b0])
> {code}
> case 4) we do prune here.
> {code}
> 0: jdbc:drill:schema=dfs.tmp> explain plan for select avg(key1) from t3214 
> where key2 = 'm';
> +------+------+
> | text | json |
> +------+------+
> | 00-00    Screen
> 00-01      Project(EXPR$0=[CAST(/(CastHigh(CASE(=($1, 0), null, $0)), 
> $1)):ANY NOT NULL])
> 00-02        StreamAgg(group=[{}], agg#0=[$SUM0($0)], agg#1=[$SUM0($1)])
> 00-03          StreamAgg(group=[{}], agg#0=[$SUM0($0)], agg#1=[COUNT($0)])
> 00-04            Project(key1=[$1])
> 00-05              Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=/tmp/t3214/0_0_15.parquet]], 
> selectionRoot=maprfs:/tmp/t3214, numFiles=1, columns=[`key2`, `key1`]]])
> {code}
> case 5) we do prune here.
> {code}
> 0: jdbc:drill:schema=dfs.tmp> explain plan for select min(key1) from t3214 
> where key2 = 'm';
> +------+------+
> | text | json |
> +------+------+
> | 00-00    Screen
> 00-01      Project(EXPR$0=[$0])
> 00-02        StreamAgg(group=[{}], EXPR$0=[MIN($0)])
> 00-03          StreamAgg(group=[{}], EXPR$0=[MIN($0)])
> 00-04            Project(key1=[$1])
> 00-05              Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=/tmp/t3214/0_0_15.parquet]], 
> selectionRoot=maprfs:/tmp/t3214, numFiles=1, columns=[`key2`, `key1`]]])
> {code}
> commit id that I am testing on : 17e580a7



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (DRILL-3538) We do not prune partitions when we count over partitioning key and filter over partitioning key

Reply via email to