Copilot commented on code in PR #21426:
URL: https://github.com/apache/datafusion/pull/21426#discussion_r3042616134


##########
datafusion/physical-optimizer/src/pushdown_sort.rs:
##########
@@ -102,6 +88,8 @@ impl PhysicalOptimizerRule for PushdownSort {
             return Ok(plan);
         }
 
+        let buffer_capacity = config.execution.sort_pushdown_buffer_capacity;
+

Review Comment:
   `sort_pushdown_buffer_capacity` can be set to `0` (it’s a `usize`), but 
`BufferExec` ultimately passes this into `MemoryBufferedStream`, which uses 
`Semaphore::new(capacity)` and then `acquire_many_owned(size.min(capacity) as 
u32)`. With `capacity == 0`, `acquire_many_owned(0)` never blocks, effectively 
turning the buffer into an unbounded prefetcher (and can grow without bound 
when using the default unbounded memory pool). Consider validating this option 
(e.g., require > 0 and return a config error) or treating `0` as “disable 
inserting BufferExec” (similar to `hash_join_buffering_capacity`).



##########
datafusion/common/src/config.rs:
##########
@@ -557,6 +557,19 @@ config_namespace! {
         /// batches and merged.
         pub sort_in_place_threshold_bytes: usize, default = 1024 * 1024
 
+        /// Maximum buffer capacity (in bytes) per partition for BufferExec
+        /// inserted during sort pushdown optimization.
+        ///
+        /// When PushdownSort eliminates a SortExec under 
SortPreservingMergeExec,
+        /// a BufferExec is inserted to replace SortExec's buffering role. This
+        /// prevents I/O stalls by allowing the scan to run ahead of the merge.
+        ///
+        /// This uses strictly less memory than the SortExec it replaces (which
+        /// buffers the entire partition). The buffer respects the global 
memory
+        /// pool limit. Setting this to a large value is safe — actual memory
+        /// usage is bounded by partition size and global memory limits.

Review Comment:
   The doc comment says “Setting this to a large value is safe — actual memory 
usage is bounded by partition size and global memory limits.” However, the 
default `RuntimeEnv` uses an `UnboundedMemoryPool` (no global limit) unless the 
user explicitly configures one, so a large value can materially increase peak 
memory usage / OOM risk. Suggest rewording to clarify it respects configured 
memory limits when present, and that large values may increase peak memory in 
unbounded configurations (especially with many partitions).
   ```suggestion
           /// buffers the entire partition). The buffer respects configured 
global
           /// memory pool limits when present, and actual usage is also 
bounded by
           /// partition size. In unbounded-memory configurations, however, 
larger
           /// values may increase peak memory usage and OOM risk, especially 
when
           /// many partitions are buffered concurrently.
   ```



##########
datafusion/common/src/config.rs:
##########
@@ -557,6 +557,19 @@ config_namespace! {
         /// batches and merged.
         pub sort_in_place_threshold_bytes: usize, default = 1024 * 1024
 
+        /// Maximum buffer capacity (in bytes) per partition for BufferExec
+        /// inserted during sort pushdown optimization.
+        ///
+        /// When PushdownSort eliminates a SortExec under 
SortPreservingMergeExec,
+        /// a BufferExec is inserted to replace SortExec's buffering role. This
+        /// prevents I/O stalls by allowing the scan to run ahead of the merge.
+        ///
+        /// This uses strictly less memory than the SortExec it replaces (which
+        /// buffers the entire partition). The buffer respects the global 
memory
+        /// pool limit. Setting this to a large value is safe — actual memory
+        /// usage is bounded by partition size and global memory limits.
+        pub sort_pushdown_buffer_capacity: usize, default = 1024 * 1024 * 1024

Review Comment:
   This new config option appears user-facing, but 
`docs/source/user-guide/configs.md` (the config reference table) doesn’t 
currently list `datafusion.execution.sort_pushdown_buffer_capacity`. Please add 
it there so users can discover the setting and its default.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to