ACking-you opened a new issue, #8147:
URL: https://github.com/apache/arrow-datafusion/issues/8147
### Describe the bug
Timestamp types with timezone are not considered in Join's equivalent
expression
### To Reproduce
We have a `test_table` with a `date` field of type
`Timestamp(TimeUnit::Millisecond, Some("UTC".into()))`, and the SQL `select *
from test_table a join test_table b on a.date = b.date` produces the following
Optimized LogicalPlan and PhysicalPlan:
```text
Optimized LogicalPlan:
Inner Join: Filter: a.date = b.date
SubqueryAlias: a
TableScan: test_table projection=[a, b, date]
SubqueryAlias: b
TableScan: test_table projection=[a, b, date]
Optimized PhysicalPlan:
NestedLoopJoinExec: join_type=Inner, filter=date@0 = date@1
RepartitionExec: partitioning=RoundRobinBatch(12), input_partitions=1
CustomExec
CustomExec
```
### Expected behavior
As expected, the equal filter condition is optimized to be an on condition
and eventually generates a `HashJoinExec`.
```text
Optimized LogicalPlan:
Inner Join: a.date = b.date
SubqueryAlias: a
TableScan: test_table projection=[a, b, date]
SubqueryAlias: b
TableScan: test_table projection=[a, b, date]
Optimized PhysicalPlan:
HashJoinExec: mode=Partitioned, join_type=Inner, on=[(date@2, date@2)]
xxx
CustomExec
xxx
CustomExec
```
### Additional context
I tried to look at the detailed source code implementation and found that it
should be caused by not considering the timezone of the timestamp type in the
function
[can_hash](https://docs.rs/datafusion/latest/datafusion/logical_expr/utils/fn.can_hash.html).
If that's the case, I might try to fix it
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]