2010YOUY01 opened a new issue, #16908: URL: https://github.com/apache/datafusion/issues/16908
### Is your feature request related to a problem or challenge? This is a follow-up to: https://github.com/apache/datafusion/pull/15700 Part of https://github.com/apache/datafusion/issues/15271 #15700 introduces a comprehensive approach for external sorts: if it's not possible to do the merge for all spills in one pass, it will merge as many spills as possible, do a re-spill, and iterate until we can produce the final output at once. Note it is also use estimations for the memory accounting, under extreme edge cases this estimation can be inaccurate, causing the actual memory usage larger than the configured memory limit. To prevent such unintentional cases and to enable performance tuning (merging a smaller number of batches at once is more cache-friendly/faster), we can introduce a additional configuration option: `max_merge_degree: Maximum number of streams to merge during re-spills in external sorts` The original context: https://github.com/apache/datafusion/pull/15700#issuecomment-3034804482 ### Describe the solution you'd like This previous attempt can be used as the reference for the solution: https://github.com/apache/datafusion/pull/15610 It has to be re-structured to build upon #15700 ### Describe alternatives you've considered _No response_ ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org