alamb commented on issue #4140:
URL: 
https://github.com/apache/arrow-datafusion/issues/4140#issuecomment-1309053379

   For what it is worth, the existing join logic contains hard coded 
assumptions that the join is between two columns in several places. Changing 
the join logic (which is already complicated and will likely only get more so) 
is likely to be quite challenging
   
   So therefore  I agree with @mingmwang's proposal of:
   
   > My current preference is to keep the equal join condition as columns, but 
add another projection to project the expression to normal columns and push 
down the projection.
   
   
   > if the expressions are trival cast, support them as equal join conditions
   
   I don't think there would be any performance difference between casting in a 
Projection or in the Join itself and I think it would keep the Join 
significantly less complicated. 
   
   I would suggest not handling casts in the Join but instead work on improving 
the other  optimizer simplification rules to remove the casts. Like the 
expression
   
   ```
   |               |   Filter: CAST(t0.c0 AS Int64) + Int64(1) = CAST(t1.c0 AS 
Int64)                                          |
   ```
   
   Can be rewritten into
   ```
   |               |   Filter: t0.c0 + Int32(1) = t1.c0                         
                  |
   ```
   
   Which we would still have to have a `projection` to evaluate `t0.c0 + 1` but 
`t1.c0` would need no processing
   
   
   This type of unwrapping is already done in 
https://github.com/apache/arrow-datafusion/blob/master/datafusion/optimizer/src/unwrap_cast_in_comparison.rs
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to