[ 
https://issues.apache.org/jira/browse/DRILL-2748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14804523#comment-14804523
 ] 

Jinfeng Ni commented on DRILL-2748:
-----------------------------------

DrillAggregateRel, which is the logical aggregate operator, currently call 
makeTinyCost(), regardless how many rows the input has, how many aggregation 
functions it contains.  That lead to the costing estimation could not reflect 
the saving, if the filter is push past the aggregate operator. 

The fix is to improve the cost estimation for logical aggregate. Similar to 
join operator, we use the same formula for HashAgg and logical aggregate, under 
the assumption that HashAgg is more likely to happen in the physical planning.



> Filter is not pushed down into subquery with the group by
> ---------------------------------------------------------
>
>                 Key: DRILL-2748
>                 URL: https://issues.apache.org/jira/browse/DRILL-2748
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Query Planning & Optimization
>    Affects Versions: 0.9.0, 1.0.0, 1.1.0
>            Reporter: Victoria Markman
>            Assignee: Jinfeng Ni
>             Fix For: 1.2.0
>
>         Attachments: 
> 0001-DRILL-2748-Add-optimizer-rule-to-push-filter-past-ag.patch
>
>
> I'm not sure about this one, theoretically filter could have been pushed into 
> the subquery.
> {code}
> 0: jdbc:drill:schema=dfs> explain plan for select x, y, z from (select a1, 
> b1, avg(a1) from t1 group by a1, b1) as sq(x, y, z) where x = 10;
> +------------+------------+
> |    text    |    json    |
> +------------+------------+
> | 00-00    Screen
> 00-01      Project(x=[$0], y=[$1], z=[$2])
> 00-02        Project(x=[$0], y=[$1], z=[CAST(/(CastHigh(CASE(=($3, 0), null, 
> $2)), $3)):ANY NOT NULL])
> 00-03          SelectionVectorRemover
> 00-04            Filter(condition=[=($0, 10)])
> 00-05              HashAgg(group=[{0, 1}], agg#0=[$SUM0($0)], 
> agg#1=[COUNT($0)])
> 00-06                Project(a1=[$1], b1=[$0])
> 00-07                  Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/predicates/t1]], 
> selectionRoot=/drill/testdata/predicates/t1, numFiles=1, columns=[`a1`, 
> `b1`]]])
> {code}
> Same with distinct in subquery:
> {code}
> 0: jdbc:drill:schema=dfs> explain plan for select x, y, z from ( select 
> distinct a1, b1, c1 from t1 ) as sq(x, y, z) where x = 10;
> +------------+------------+
> |    text    |    json    |
> +------------+------------+
> | 00-00    Screen
> 00-01      Project(x=[$0], y=[$1], z=[$2])
> 00-02        Project(x=[$0], y=[$1], z=[$2])
> 00-03          SelectionVectorRemover
> 00-04            Filter(condition=[=($0, 10)])
> 00-05              HashAgg(group=[{0, 1, 2}])
> 00-06                Project(a1=[$2], b1=[$1], c1=[$0])
> 00-07                  Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/predicates/t1]], 
> selectionRoot=/drill/testdata/predicates/t1, numFiles=1, columns=[`a1`, `b1`, 
> `c1`]]])
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to