Dandandan opened a new pull request, #21678:
URL: https://github.com/apache/datafusion/pull/21678

   ## Which issue does this PR close?
   
   <!-- Related to improving CPU utilization in parallel query execution (e.g. 
ClickBench). -->
   
   ## Rationale for this change
   
   In the non-preserve-order path, `RepartitionExec` currently creates one MPSC 
channel per output partition, shared by all N input senders. On every send each 
input task must:
   1. Acquire the output channel's state \`Mutex\`
   2. Go through the global \`Gate\` backpressure check
   
   This shared-channel state mutex is one of the hottest locks in parallel 
query execution and scales poorly with input parallelism.
   
   ## What changes are included in this PR?
   
   Use \`partition_aware_channels\` (which the preserve-order path already 
uses) for non-preserve-order too. Each (input, output) pair gets its own SPSC 
channel — no shared senders, no cross-input contention on sends. On the 
consumer side, merge the N per-input streams with 
\`futures::stream::select_all\` (unordered first-ready) instead of 
\`StreamingMergeBuilder\`. Coalescing is lifted out of the per-input streams 
and applied once on the merged output via a small \`CoalescingOutputStream\` 
wrapper so observable batch sizes are unchanged.
   
   This is a proof-of-concept to see whether removing MPSC contention shows a 
measurable benefit on ClickBench before considering more invasive changes (e.g. 
replacing the channel transport entirely).
   
   ## Are these changes tested?
   
   Covered by the existing repartition test suite (41 tests pass), including 
spill, dropped-output-stream, delayed-stream, and ordering-preservation.
   
   ## Are there any user-facing changes?
   
   No — same memory semantics, same batch sizes, same ordering guarantees.
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to