alamb commented on issue #4140:
URL:
https://github.com/apache/arrow-datafusion/issues/4140#issuecomment-1309053379
For what it is worth, the existing join logic contains hard coded
assumptions that the join is between two columns in several places. Changing
the join logic (which is already complicated and will likely only get more so)
is likely to be quite challenging
So therefore I agree with @mingmwang's proposal of:
> My current preference is to keep the equal join condition as columns, but
add another projection to project the expression to normal columns and push
down the projection.
> if the expressions are trival cast, support them as equal join conditions
I don't think there would be any performance difference between casting in a
Projection or in the Join itself and I think it would keep the Join
significantly less complicated.
I would suggest not handling casts in the Join but instead work on improving
the other optimizer simplification rules to remove the casts. Like the
expression
```
| | Filter: CAST(t0.c0 AS Int64) + Int64(1) = CAST(t1.c0 AS
Int64) |
```
Can be rewritten into
```
| | Filter: t0.c0 + Int32(1) = t1.c0
|
```
Which we would still have to have a `projection` to evaluate `t0.c0 + 1` but
`t1.c0` would need no processing
This type of unwrapping is already done in
https://github.com/apache/arrow-datafusion/blob/master/datafusion/optimizer/src/unwrap_cast_in_comparison.rs
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]