[ 
https://issues.apache.org/jira/browse/DRILL-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947906#comment-14947906
 ] 

Steven Phillips commented on DRILL-3912:
----------------------------------------

1) I had not enabled CSE in hash join, so it didn't have that problem. Now that 
I have enabled in hash join, I am seeing the same SR error.

2) In this case, it looks like the ConstantFilter is causing the '1 + 2' and '1 
+ 3' parts of the expressions to be resolved first, and then 'a + 1' is no 
longer common. Duplicate vectors reads are removed, though. I think this 
behavior is probably fine.

3) I am not targeting this for 1.2. Probably for 1.3. My main motivation here 
was to solve a problem I was running into in my Union-type work. Function 
resolution when there is Union type for the input involves case statements that 
check the current type of the input, and then executes a branch based on that 
type. In this case, both the condition expression as well as both branches will 
reference the input. For example, 

1 + a

would become something like

{code}
case when typeOf(a) = int
  then 1 + cast(a as int)
    when typeOf(a) = varchar
  then 1 + cast(cast(a as varchar) as int)
end
{code}

So you can see that a single reference to 'a' becomes 3 references. And 'a' 
might not just be a ValueVectorReadExpression, it could be the output from some 
other expression tree. And if an input has more than 2 types, or if a function 
has multiple Union-type inputs, the complexity of the expression increases 
dramatically, and the amount of generated code gets to be quite large. I needed 
to find some way to fix this.



> Common subexpression elimination in code generation
> ---------------------------------------------------
>
>                 Key: DRILL-3912
>                 URL: https://issues.apache.org/jira/browse/DRILL-3912
>             Project: Apache Drill
>          Issue Type: Bug
>            Reporter: Steven Phillips
>            Assignee: Jinfeng Ni
>
> Drill currently will evaluate the full expression tree, even if there are 
> redundant subtrees. Many of these redundant evaluations can be eliminated by 
> reusing the results from previously evaluated expression trees.
> For example,
> {code}
> select a + 1, (a + 1)* (a - 1) from t
> {code}
> Will compute the entire (a + 1) expression twice. With CSE, it will only be 
> evaluated once.
> The benefit will be reducing the work done when evaluating expressions, as 
> well as reducing the amount of code that is generated, which could also lead 
> to better JIT optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to