jcsherin commented on issue #12955:
URL: https://github.com/apache/datafusion/issues/12955#issuecomment-2417351688

   @vbarua Thanks. This bug report is well written.
   
   ```sql
   DataFusion CLI v42.0.0
   > EXPLAIN SELECT * FROM VALUES ('a'), ('b'), ('b'), ('c'), ('c'), ('c')
   INTERSECT ALL
   SELECT * FROM VALUES ('b'), ('b'), ('b'), ('c'), ('c');
   
+---------------+-----------------------------------------------------------------------------------+
   | plan_type     | plan                                                       
                       |
   
+---------------+-----------------------------------------------------------------------------------+
   | logical_plan  | LeftSemi Join: column1 = column1                           
                       |
   |               |   Values: (Utf8("a")), (Utf8("b")), (Utf8("b")), 
(Utf8("c")), (Utf8("c"))...      |
   |               |   Values: (Utf8("b")), (Utf8("b")), (Utf8("b")), 
(Utf8("c")), (Utf8("c"))         |
   | physical_plan | CoalesceBatchesExec: target_batch_size=8192                
                       |
   |               |   HashJoinExec: mode=Partitioned, join_type=LeftSemi, 
on=[(column1@0, column1@0)] |
   |               |     ValuesExec                                             
                       |
   |               |     ValuesExec                                             
                       |
   |               |                                                            
                       |
   
+---------------+-----------------------------------------------------------------------------------+
   ```
   
   The query generates a left semi-join plan and therefore will return only LHS 
values. If RHS happens to have the minimum number of duplicates, then this 
query will always return incorrect results.
   
   - This bug is also related: #12956
   
   ```sql
   DataFusion CLI v42.0.0
   > EXPLAIN SELECT * FROM VALUES ('a'), ('b'), ('b'), ('c'), ('c'), ('c')
   EXCEPT ALL
   SELECT * FROM VALUES ('b'), ('b'), ('b'), ('c'), ('c');
   
+---------------+-----------------------------------------------------------------------------------+
   | plan_type     | plan                                                       
                       |
   
+---------------+-----------------------------------------------------------------------------------+
   | logical_plan  | LeftAnti Join: column1 = column1                           
                       |
   |               |   Values: (Utf8("a")), (Utf8("b")), (Utf8("b")), 
(Utf8("c")), (Utf8("c"))...      |
   |               |   Values: (Utf8("b")), (Utf8("b")), (Utf8("b")), 
(Utf8("c")), (Utf8("c"))         |
   | physical_plan | CoalesceBatchesExec: target_batch_size=8192                
                       |
   |               |   HashJoinExec: mode=Partitioned, join_type=LeftAnti, 
on=[(column1@0, column1@0)] |
   |               |     ValuesExec                                             
                       |
   |               |     ValuesExec                                             
                       |
   |               |                                                            
                       |
   
+---------------+-----------------------------------------------------------------------------------+
   ``` 
   
   Here the query generates a left anti-join. So it will always exclude rows 
which match in RHS.
   
    


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to