[ 
https://issues.apache.org/jira/browse/DRILL-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947721#comment-14947721
 ] 

Ted Dunning commented on DRILL-3912:
------------------------------------

It sounds like this only deals with common sub-expressions in expressions.

A far more significant optimization would be to deal with common 
sub-expressions at a larger scale.  A classic case is multiple re-use of a 
single expression in a common table expression.  For instance,

{code}
with x as (select dir0, id from dfs.tdunning.zoom where id < 12),  
       y as (select id, count(*) cnt from x group by id),
       z as (select count(distinct id) id_count from x)
select dir0, x.id, y.cnt from x , y, z  where x.id = y.id and y.cnt / 
z.id_count >  3
{code}

Without good sub-expression elimination, table zoom will be scanned three 
times. Last I heard, DRILL doesn't optimize this away.

> Common subexpression elimination
> --------------------------------
>
>                 Key: DRILL-3912
>                 URL: https://issues.apache.org/jira/browse/DRILL-3912
>             Project: Apache Drill
>          Issue Type: Bug
>            Reporter: Steven Phillips
>            Assignee: Steven Phillips
>
> Drill currently will evaluate the full expression tree, even if there are 
> redundant subtrees. Many of these redundant evaluations can be eliminated by 
> reusing the results from previously evaluated expression trees.
> For example,
> {code}
> select a + 1, (a + 1)* (a - 1) from t
> {code}
> Will compute the entire (a + 1) expression twice. With CSE, it will only be 
> evaluated once.
> The benefit will be reducing the work done when evaluating expressions, as 
> well as reducing the amount of code that is generated, which could also lead 
> to better JIT optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to