neilconway opened a new issue, #22931: URL: https://github.com/apache/datafusion/issues/22931
### Is your feature request related to a problem or challenge? To evaluate a semi join, we support two orientations: `LeftSemi` or `RightSemi` (analogously for anti and mark joins; I'll just refer to semijoins here to simplify the discussion). Under `RightSemi`, we build the non-preserved input and stream the preserved input; these are swapped for `LeftSemi`. While it might seem like these two orientations are symmetrical, there are actually significant differences in evaluation behavior between them: * The build-side hash table has to be resident in memory; all else being equal, building the smaller join input is a good general rule, and that's the main rule we follow today. * `RightSemi` only needs to store the join keys for the build side; `LeftSemi` needs to store wider rows. By definition, the consumer of a semijoin can't be interested in any values from the non-preserved side of the join. So even if the non-preserved side has more rows than the preserved side, building the hash table on the non-preserved side might still require less memory. * `RightSemi` preserves the partitioning of the preserved input, whereas `LeftSemi` always emits with `UnknownPartitioning`. * `RightSemi` works better with dynamic filter pushdown: XXX elaborate Two additional factors that might change: * `RightSemi` only needs to build on distinct values from the non-preserved side. In the future, we can optimize `RightSemi` to discard duplicate build-side rows. We don't do that today but we might in the future (#22930) * `RightSemi` allows emitting join results incrementally: as we see each probe row, we can immediately determine if it should be output or not. Whereas `LeftSemi` consumes the _entire_ non-preserved side, marking which of the preserved-side rows matched, and only at the end of the non-preserved input stream can we do a pass over the matched bitmap to determine which preserved-side rows to emit. This is not fundamental though; probably worth fixing LeftSemi to emit incrementally (#22929) The current optimizer rules don't reflect this: * `LeftSemi` and `RightSemi` are considered symmetrically; whichever semijoin input is predicted to be smaller is placed on the build side * If there are absent stats, `LeftSemi` is chosen I think revising these rules as follows would make more sense: * Prefer `RightSemi` over `LeftSemi`, _unless_ the non-preserved input is k times larger than the preserved input. Choosing `k` is a bit arbitrary, but a value in the range of 2-4 seems reasonable. * If there are absent stats, prefer `RightSemi` ### Describe the solution you'd like _No response_ ### Describe alternatives you've considered _No response_ ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
