mbutrovich commented on issue #22854:
URL: https://github.com/apache/datafusion/issues/22854#issuecomment-4670537763

   > [@mbutrovich](https://github.com/mbutrovich) used a separate-streams 
approach when optimizing semi/anti joins in SMJ, which seems related to the 
same concern:
   > 
   > * [perf: specialized SemiAntiSortMergeJoinStream 
#20806](https://github.com/apache/datafusion/pull/20806)
   
   Not having dug into our hash join implementation recently, I agree. When 
working on SMJ, the intermediate state/data structures, state machine, etc. 
could be optimized completely differently for semi/anti/mark such that I found 
separate streams the best solution. Originally I'd split them into their own 
operator entirely, but that was a bit too aggressive. +1 for separate streams 
for semi/anti/mark that basically just need a bitmap.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to