neilconway commented on PR #22652:
URL: https://github.com/apache/datafusion/pull/22652#issuecomment-4638905090

   @Dandandan Q23 doesn't result in a plan change and I can't repro that 
locally; I think that one is just noise.
   
   Q14 does indeed repro. The plan _should_ strictly be an improvement (just 
swapping inner joins for semi-joins); it regresses because `RightSemi` joins 
are actually _slower_ than inner joins in DF right now :) We basically compute 
an inner join and then do an extra pass to eliminate any duplicates 
(`get_semi_indices`). Some ideas:
   
   (1) We could pass down a flag indicating that the join is unique on the join 
keys (case 2a described above), so we can skip the duplicate removal. This 
_should_ get semi-joins back to parity with inner joins.
   (2) Alternatively, we could write a specialized join kernel for semi and 
anti joins. This has the opportunity to be a significant win, not just get back 
to parity.
   
   I'm inclined to pursue (2), taking a look at that now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to