2010YOUY01 commented on PR #15355: URL: https://github.com/apache/datafusion/pull/15355#issuecomment-2747190211
> > 3. After we have collected 1MB of merged batch, one spill will be triggered. And this 1MB space will be cleared, the merging can continue. > > **Inefficency:** Now `ExternalSorter` will create a new spill file for those 1MB merged batches, after spilling all intermediates, all spilled files will be merged at once, then there are too many files to merge. > > **Ideal case:** All batches in a single sorted run can be incrementally appended to a single file. > > It seems to be a regression introduced by #14823. That's true, so I feel obligated to fix it. ---- Thank you for the review @alamb and @Kontinuation , I have addressed the review comments. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org