zhuqi-lucas opened a new pull request, #21426: URL: https://github.com/apache/datafusion/pull/21426
## Which issue does this PR close? Closes #21417 ## Rationale for this change #21182 introduced `BufferExec` between `SortPreservingMergeExec` and `DataSourceExec` when sort elimination removes a `SortExec`. The buffer capacity was hardcoded to 64MB, which can cause I/O stalls for wide-row full scans. ## What changes are included in this PR? - Add `datafusion.execution.sort_pushdown_buffer_capacity` config option (default 1GB) - Replace hardcoded `BUFFER_CAPACITY_AFTER_SORT_ELIMINATION` constant with the config value - Update SLT test expectations for new default capacity ## How are these changes justified? **Why 1GB default:** - This is a maximum, not pre-allocated — actual usage is bounded by partition data size - Strictly less memory than the `SortExec` it replaces (which buffers entire partition) - `BufferExec` integrates with `MemoryPool`, so global memory limits are respected - 64MB was too small for wide-row scans (16-column TPC-H `SELECT *` queries showed I/O stalls) **Why configurable:** - Different workloads have different optimal buffer sizes - Users with memory-constrained environments can reduce it - Users with wide tables or large row groups can increase it ## Are these changes tested? - Existing SLT Test G verifies `BufferExec` appears in plan with correct capacity - Config integration tested via existing config framework ## Are there any user-facing changes? New config option: `datafusion.execution.sort_pushdown_buffer_capacity` (default: 1GB) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
