zhuqi-lucas opened a new pull request, #21426:
URL: https://github.com/apache/datafusion/pull/21426

   ## Which issue does this PR close?
   
   Closes #21417
   
   ## Rationale for this change
   
   #21182 introduced `BufferExec` between `SortPreservingMergeExec` and 
`DataSourceExec` when sort elimination removes a `SortExec`. The buffer 
capacity was hardcoded to 64MB, which can cause I/O stalls for wide-row full 
scans.
   
   ## What changes are included in this PR?
   
   - Add `datafusion.execution.sort_pushdown_buffer_capacity` config option 
(default 1GB)
   - Replace hardcoded `BUFFER_CAPACITY_AFTER_SORT_ELIMINATION` constant with 
the config value
   - Update SLT test expectations for new default capacity
   
   ## How are these changes justified?
   
   **Why 1GB default:**
   - This is a maximum, not pre-allocated — actual usage is bounded by 
partition data size
   - Strictly less memory than the `SortExec` it replaces (which buffers entire 
partition)
   - `BufferExec` integrates with `MemoryPool`, so global memory limits are 
respected
   - 64MB was too small for wide-row scans (16-column TPC-H `SELECT *` queries 
showed I/O stalls)
   
   **Why configurable:**
   - Different workloads have different optimal buffer sizes
   - Users with memory-constrained environments can reduce it
   - Users with wide tables or large row groups can increase it
   
   ## Are these changes tested?
   
   - Existing SLT Test G verifies `BufferExec` appears in plan with correct 
capacity
   - Config integration tested via existing config framework
   
   ## Are there any user-facing changes?
   
   New config option: `datafusion.execution.sort_pushdown_buffer_capacity` 
(default: 1GB)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to