Dandandan opened a new pull request, #22677: URL: https://github.com/apache/datafusion/pull/22677
## Which issue does this PR close? - N/A (hash join probe-side micro-optimization). ## Rationale for this change In the hash join probe path, `JoinHashMap::get_matched_indices_with_limit_offset` handles the non-unique-key (chained) case by, for each probe row, looking up the head of the collision chain (`map.find`) and immediately walking that chain. Because the `map.find` result feeds straight into the chain walk that consumes it, the hash-table probes are effectively serialized: each row's hash-table cache miss must resolve before the next row's lookup begins. The `map.find` miss is the dominant cost in this path, and these lookups are independent across probe rows — an ideal candidate for memory-level parallelism. ## What changes are included in this PR? Process the probe rows in windows of 16 with two phases per window: 1. **Lookup phase** — resolve the head-of-chain index (`map.find`) for every row in the window into a small stack array. These probes are independent, so their cache misses overlap (several outstanding at once) instead of each stalling the next row. 2. **Traversal phase** — walk the chains in probe-row order, exactly as before. `traverse_chain` remains the sole authority for the per-call output limit and resume offset, and is still invoked in probe-row order, so the **limit/offset resume protocol is unchanged**. Heads looked up for rows past the output limit are simply discarded and recomputed on the next call (the lookup is a pure function of the hash). The unique-key fast path and the mid-chain resume handling are untouched. ## Are these changes tested? Yes: - New unit test `test_limit_offset_window_boundary_matches_unbounded` asserts the windowed lookup yields output identical to a single unbounded call, across more than one window (20 probe rows) with chains/singletons/misses, for several limits — including ones that split mid-chain and don't divide the window size. - `cargo test -p datafusion-physical-plan --lib joins::` — 968 tests pass. - `cargo test -p datafusion-sqllogictest --test sqllogictests -- joins` — passes. ## Are there any user-facing changes? No. Internal performance optimization with identical results. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
