mbutrovich commented on issue #22854: URL: https://github.com/apache/datafusion/issues/22854#issuecomment-4670537763
> [@mbutrovich](https://github.com/mbutrovich) used a separate-streams approach when optimizing semi/anti joins in SMJ, which seems related to the same concern: > > * [perf: specialized SemiAntiSortMergeJoinStream #20806](https://github.com/apache/datafusion/pull/20806) Not having dug into our hash join implementation recently, I agree. When working on SMJ, the intermediate state/data structures, state machine, etc. could be optimized completely differently for semi/anti/mark such that I found separate streams the best solution. Originally I'd split them into their own operator entirely, but that was a bit too aggressive. +1 for separate streams for semi/anti/mark that basically just need a bitmap. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
