Dandandan opened a new pull request, #22653: URL: https://github.com/apache/datafusion/pull/22653
## Which issue does this PR close? - Not applicable. ## Rationale for this change Right semi and right anti joins only need to know whether each probe-side row has at least one matching build-side row when there is no residual join filter. The generic hash join path currently materializes every duplicate build-side match and later deduplicates or inverts probe indices, doing unnecessary work for duplicated existence-side keys. In a focused local throwaway benchmark with 10,000 duplicate build rows and 10,000 probe rows, the old lookup path enumerated 100,000,000 candidate pairs in 183.872 ms, while the new existence lookup returned 10,000 probe matches in 45.791 us. ## What changes are included in this PR? - Add a hash-map existence probe that stops walking a duplicate chain after the first equality-confirmed match. - Add an ArrayMap membership probe for the same right semi/anti use case. - Route `RightSemi` and `RightAnti` hash joins without residual filters through the existence path. - Keep the generic path for joins with residual filters, where duplicate build rows may affect filter results. - Add a unit test covering early stop behavior for duplicate build-side matches. ## Are these changes tested? - `cargo fmt --all` - `cargo clippy --all-targets --all-features -- -D warnings` - `cargo test -p datafusion-physical-plan hash_join --lib` - `cargo test -p datafusion-physical-plan joins::join_hash_map::tests::test_probe_indices_with_any_match_stops_after_first_match --lib` ## Are there any user-facing changes? No API or behavior changes expected. This is a physical execution optimization for right semi/anti hash joins without residual filters. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
