Jefffrey commented on issue #8724:
URL:
https://github.com/apache/arrow-datafusion/issues/8724#issuecomment-1875941267
Just a note, that this doesn't apply only for `<col> <op> <lietral>`, but
also `<col1> <op> <col2>`:
```
DataFusion CLI v34.0.0
❯ CREATE TABLE kumachan1 (wakana string) AS VALUES ('ookuma');
0 rows in set. Query took 0.004 seconds.
❯ CREATE TABLE kumachan2 (wakana string) AS VALUES ('ookuma');
0 rows in set. Query took 0.003 seconds.
❯ select * from kumachan1 k1 join kumachan2 k2 on k1.wakana = k2.wakana and
k2.wakana = k1.wakana;
+--------+--------+
| wakana | wakana |
+--------+--------+
| ookuma | ookuma |
+--------+--------+
1 row in set. Query took 0.007 seconds.
❯ explain select * from kumachan1 k1 join kumachan2 k2 on k1.wakana =
k2.wakana and k2.wakana = k1.wakana;
+---------------+----------------------------------------------------------------------------------------------------+
| plan_type | plan
|
+---------------+----------------------------------------------------------------------------------------------------+
| logical_plan | Inner Join: k1.wakana = k2.wakana, k1.wakana = k2.wakana
|
| | SubqueryAlias: k1
|
| | TableScan: kumachan1 projection=[wakana]
|
| | SubqueryAlias: k2
|
| | TableScan: kumachan2 projection=[wakana]
|
| physical_plan | CoalesceBatchesExec: target_batch_size=8192
|
| | HashJoinExec: mode=Partitioned, join_type=Inner,
on=[(wakana@0, wakana@0), (wakana@0, wakana@0)] |
| | CoalesceBatchesExec: target_batch_size=8192
|
| | RepartitionExec: partitioning=Hash([wakana@0,
wakana@0], 12), input_partitions=1 |
| | MemoryExec: partitions=1, partition_sizes=[1]
|
| | CoalesceBatchesExec: target_batch_size=8192
|
| | RepartitionExec: partitioning=Hash([wakana@0,
wakana@0], 12), input_partitions=1 |
| | MemoryExec: partitions=1, partition_sizes=[1]
|
| |
|
+---------------+----------------------------------------------------------------------------------------------------+
2 rows in set. Query took 0.005 seconds
```
So ideally before the `extract_equijoin_predicate` optimizer rule runs, the
join condition filter should be simplified first to reduce that duplication. I
just chose using a literal in the original issue as it had a smaller MRE.
> @Jefffrey if you agree with adding this directly as a rule to the
ExprSimplifier, I think it would potentially make a good first project for
someone
Yes this sounds good
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]