[ 
https://issues.apache.org/jira/browse/DRILL-2748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14804541#comment-14804541
 ] 

Jinfeng Ni commented on DRILL-2748:
-----------------------------------

The reason that the unit test case I added in the first path worked : the 
filter is on partition column. The filter pushdown lead to partition pruning, 
which would lead to reduction in the scan cost. Therefore, the new plan with 
filter push down is estimated to have lower cost.


> Filter is not pushed down into subquery with the group by
> ---------------------------------------------------------
>
>                 Key: DRILL-2748
>                 URL: https://issues.apache.org/jira/browse/DRILL-2748
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Query Planning & Optimization
>    Affects Versions: 0.9.0, 1.0.0, 1.1.0
>            Reporter: Victoria Markman
>            Assignee: Aman Sinha
>             Fix For: 1.2.0
>
>         Attachments: 
> 0001-DRILL-2748-Improve-cost-estimation-for-Drill-logical.patch
>
>
> I'm not sure about this one, theoretically filter could have been pushed into 
> the subquery.
> {code}
> 0: jdbc:drill:schema=dfs> explain plan for select x, y, z from (select a1, 
> b1, avg(a1) from t1 group by a1, b1) as sq(x, y, z) where x = 10;
> +------------+------------+
> |    text    |    json    |
> +------------+------------+
> | 00-00    Screen
> 00-01      Project(x=[$0], y=[$1], z=[$2])
> 00-02        Project(x=[$0], y=[$1], z=[CAST(/(CastHigh(CASE(=($3, 0), null, 
> $2)), $3)):ANY NOT NULL])
> 00-03          SelectionVectorRemover
> 00-04            Filter(condition=[=($0, 10)])
> 00-05              HashAgg(group=[{0, 1}], agg#0=[$SUM0($0)], 
> agg#1=[COUNT($0)])
> 00-06                Project(a1=[$1], b1=[$0])
> 00-07                  Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/predicates/t1]], 
> selectionRoot=/drill/testdata/predicates/t1, numFiles=1, columns=[`a1`, 
> `b1`]]])
> {code}
> Same with distinct in subquery:
> {code}
> 0: jdbc:drill:schema=dfs> explain plan for select x, y, z from ( select 
> distinct a1, b1, c1 from t1 ) as sq(x, y, z) where x = 10;
> +------------+------------+
> |    text    |    json    |
> +------------+------------+
> | 00-00    Screen
> 00-01      Project(x=[$0], y=[$1], z=[$2])
> 00-02        Project(x=[$0], y=[$1], z=[$2])
> 00-03          SelectionVectorRemover
> 00-04            Filter(condition=[=($0, 10)])
> 00-05              HashAgg(group=[{0, 1, 2}])
> 00-06                Project(a1=[$2], b1=[$1], c1=[$0])
> 00-07                  Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/predicates/t1]], 
> selectionRoot=/drill/testdata/predicates/t1, numFiles=1, columns=[`a1`, `b1`, 
> `c1`]]])
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to