alamb commented on issue #4564:
URL: 
https://github.com/apache/arrow-datafusion/issues/4564#issuecomment-1344671491

   > Given shuffling IPC files seems specific to Ballista, it would perhaps 
make more sense to be something that lives within that codebase? So a 
ShuffleManager or PersistFileManager as you suggest
   
   One reason it might make sense to extend `DiskManager` is so that Ballista / 
DataFusion could have a more complete view of disk file usage. For example, if 
Ballista wanted to be able to cap the total disk space used across IPC files 
and spill files it might be hard to do so with two different disk managers.
   
   > But if I extend the capabilities of the DiskManager, it means it should 
never be disabled. the spilling disable logic will have to be moved out from 
the DiskManager.
   
   I suspect you could extend the API to distinguish between "is spilling 
allows" and "please keep track of and clean up these files made during 
execution" as well
   
   I don't have a strong preference how you do it as long as there is still a 
way to limit memory used by SortExec -- there is a test for this in 
https://github.com/apache/arrow-datafusion/blob/master/datafusion/core/tests/memory_limit.rs
 so that should help guide the implementation
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to