2010YOUY01 commented on code in PR #18207:
URL: https://github.com/apache/datafusion/pull/18207#discussion_r2458601595
##########
datafusion/common/src/config.rs:
##########
@@ -517,6 +517,20 @@ config_namespace! {
/// batches and merged.
pub sort_in_place_threshold_bytes: usize, default = 1024 * 1024
+ /// Maximum size in bytes for individual spill files before rotating
to a new file.
+ ///
+ /// When operators spill data to disk (e.g., RepartitionExec,
SortExec), they write
+ /// multiple batches to the same file until this size limit is
reached, then rotate
+ /// to a new file. This reduces syscall overhead compared to
one-file-per-batch
+ /// while preventing files from growing too large.
+ ///
+ /// A larger value reduces file creation overhead but may hold more
disk space.
+ /// A smaller value creates more files but allows finer-grained space
reclamation
+ /// (especially in LIFO mode where files are truncated after reading).
+ ///
+ /// Default: 100 MB
Review Comment:
Maybe 128MB default can satisfy folks with alignment OCD (half joking)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]