andygrove opened a new pull request, #20464:
URL: https://github.com/apache/datafusion/pull/20464

   ## Summary
   - Adds a criterion micro-benchmark for SortMergeJoinExec that measures join 
kernel performance in isolation
   - Pre-sorted RecordBatches are fed directly into the join operator, avoiding 
sort/scan overhead
   - Data is constructed once and reused across iterations; only the 
`TestMemoryExec` wrapper is recreated per iteration
   
   ## Benchmarks
   
   Five scenarios covering the most common SMJ patterns:
   
   | Benchmark | Join Type | Key Pattern |
   |-----------|-----------|-------------|
   | `inner_1to1` | Inner | 100K unique keys per side |
   | `inner_1to10` | Inner | 10K keys, ~10 rows per key |
   | `left_1to1_unmatched` | Left | ~5% unmatched on left side |
   | `left_semi_1to10` | Left Semi | 10K keys |
   | `left_anti_partial` | Left Anti | Partial key overlap |
   
   ## Usage
   
   ```bash
   cargo bench -p datafusion-physical-plan --features test_utils --bench 
sort_merge_join
   ```
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to