Re: [I] Expression simplifier does not simplify `A = B AND B = A` [arrow-datafusion]

via GitHub Wed, 03 Jan 2024 12:39:11 -0800


Jefffrey commented on issue #8724:
URL: 
https://github.com/apache/arrow-datafusion/issues/8724#issuecomment-1875941267


   Just a note, that this doesn't apply only for `<col> <op> <lietral>`, but 
also `<col1> <op> <col2>`:
   
   ```
   DataFusion CLI v34.0.0
   ❯ CREATE TABLE kumachan1 (wakana string) AS VALUES ('ookuma');
   0 rows in set. Query took 0.004 seconds.
   
   ❯ CREATE TABLE kumachan2 (wakana string) AS VALUES ('ookuma');
   0 rows in set. Query took 0.003 seconds.
   
   ❯ select * from kumachan1 k1 join kumachan2 k2 on k1.wakana = k2.wakana and 
k2.wakana = k1.wakana;
   +--------+--------+
   | wakana | wakana |
   +--------+--------+
   | ookuma | ookuma |
   +--------+--------+
   1 row in set. Query took 0.007 seconds.
   
   ❯ explain select * from kumachan1 k1 join kumachan2 k2 on k1.wakana = 
k2.wakana and k2.wakana = k1.wakana;
   
+---------------+----------------------------------------------------------------------------------------------------+
   | plan_type     | plan                                                       
                                        |
   
+---------------+----------------------------------------------------------------------------------------------------+
   | logical_plan  | Inner Join: k1.wakana = k2.wakana, k1.wakana = k2.wakana   
                                        |
   |               |   SubqueryAlias: k1                                        
                                        |
   |               |     TableScan: kumachan1 projection=[wakana]               
                                        |
   |               |   SubqueryAlias: k2                                        
                                        |
   |               |     TableScan: kumachan2 projection=[wakana]               
                                        |
   | physical_plan | CoalesceBatchesExec: target_batch_size=8192                
                                        |
   |               |   HashJoinExec: mode=Partitioned, join_type=Inner, 
on=[(wakana@0, wakana@0), (wakana@0, wakana@0)] |
   |               |     CoalesceBatchesExec: target_batch_size=8192            
                                        |
   |               |       RepartitionExec: partitioning=Hash([wakana@0, 
wakana@0], 12), input_partitions=1             |
   |               |         MemoryExec: partitions=1, partition_sizes=[1]      
                                        |
   |               |     CoalesceBatchesExec: target_batch_size=8192            
                                        |
   |               |       RepartitionExec: partitioning=Hash([wakana@0, 
wakana@0], 12), input_partitions=1             |
   |               |         MemoryExec: partitions=1, partition_sizes=[1]      
                                        |
   |               |                                                            
                                        |
   
+---------------+----------------------------------------------------------------------------------------------------+
   2 rows in set. Query took 0.005 seconds
   ```
   
   So ideally before the `extract_equijoin_predicate` optimizer rule runs, the 
join condition filter should be simplified first to reduce that duplication. I 
just chose using a literal in the original issue as it had a smaller MRE.
   
   > @Jefffrey if you agree with adding this directly as a rule to the 
ExprSimplifier, I think it would potentially make a good first project for 
someone
   
   Yes this sounds good


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] Expression simplifier does not simplify `A = B AND B = A` [arrow-datafusion]

Reply via email to