2010YOUY01 commented on PR #18207:
URL: https://github.com/apache/datafusion/pull/18207#issuecomment-3434946419

   > > This PR is setting size limit to spill files, when the size exceeds 
threshold, the spiller rotates to new file. I'm wondering why this design? Now 
the spill writer and reader is able to do streaming read/write, so a large 
spill file usually won't be the issue, unless it needs more parallelism 
somewhere.
   > 
   > The issue with using a single FIFO file is that you accumulate dead data, 
bloating disk usage considerably. The idea is to cap that at say 100MB and then 
start a new file so that once all of the original file has been consumed we can 
garbage collect it.
   
   This makes a lot of sense, operators should release disk usage sooner if 
possible.
   
   I will to review it soon.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to