alamb commented on code in PR #15355: URL: https://github.com/apache/datafusion/pull/15355#discussion_r2009075716
########## datafusion/physical-plan/src/sorts/sort.rs: ########## @@ -230,9 +219,14 @@ struct ExternalSorter { /// if `Self::in_mem_batches` are sorted in_mem_batches_sorted: bool, - /// If data has previously been spilled, the locations of the - /// spill files (in Arrow IPC format) - spills: Vec<RefCountedTempFile>, + /// During external sorting, in-memory intermediate data will be appended to + /// this file incrementally. Once finished, this file will be moved to [`Self::finished_spill_files`]. + in_progress_spill_file: Option<InProgressSpillFile>, + /// If data has previously been spilled, the locations of the spill files (in + /// Arrow IPC format) + /// Within the same spill file, the data might be chunked into multiple batches, + /// and ordered by sort keys. + finished_spill_files: Vec<RefCountedTempFile>, Review Comment: The different semantics for different operations makes sense to me I was thinking more mechnically, like just storing the Vec<RefCountedTempFile>` as a field on `SortManager` and allowing Sort and Hash, etc to access / manipulate it as required. I think it is fine to consider this in a future PR as well -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org