Dandandan opened a new issue, #3946:
URL: https://github.com/apache/arrow-datafusion/issues/3946

   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   Currently, we have a AntiJoin option, this is basically a left anti join.
   This works nicely, but because we don't have a  RightAntiJoin we can't 
change the build/probe order.
   
   If we have a RightAntiJoin we could support this - changing to use a right 
anti join whenever the build side is larger than the probe side. So basically - 
if there is no match for the right on the left side, include the row in the 
results.
   
   Here I found an example in Ballista (q22) where applying the optimization 
doesn't produce the optimal order:
   
   ```
                   HashJoinExec: mode=Partitioned, join_type=Anti, on=[(Column 
{ name: "c_custkey", index: 0 }, Column { name: "o_custkey", index: 0 })], 
metrics=[output_rows=0, elapsed_compute=65.054µs, spill_count=0, 
spilled_bytes=0, mem_used=0]
                     CoalesceBatchesExec: target_batch_size=4096, metrics=[]
                       ShuffleReaderExec: partitions=16, 
metrics=[output_rows=14999999, elapsed_compute=707.327197ms, spill_count=0, 
spilled_bytes=0, mem_used=0]
                     CoalesceBatchesExec: target_batch_size=4096, metrics=[]
                       ShuffleReaderExec: partitions=16, 
metrics=[output_rows=0, elapsed_compute=1ns, spill_count=0, spilled_bytes=0, 
mem_used=0]
   ```
   **Describe the solution you'd like**
   
   Implement RightAntiJoin and swap join order if the build side is bigger.
   
   **Describe alternatives you've considered**
   
   **Additional context**


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to