alamb commented on issue #4564: URL: https://github.com/apache/arrow-datafusion/issues/4564#issuecomment-1344671491
> Given shuffling IPC files seems specific to Ballista, it would perhaps make more sense to be something that lives within that codebase? So a ShuffleManager or PersistFileManager as you suggest One reason it might make sense to extend `DiskManager` is so that Ballista / DataFusion could have a more complete view of disk file usage. For example, if Ballista wanted to be able to cap the total disk space used across IPC files and spill files it might be hard to do so with two different disk managers. > But if I extend the capabilities of the DiskManager, it means it should never be disabled. the spilling disable logic will have to be moved out from the DiskManager. I suspect you could extend the API to distinguish between "is spilling allows" and "please keep track of and clean up these files made during execution" as well I don't have a strong preference how you do it as long as there is still a way to limit memory used by SortExec -- there is a test for this in https://github.com/apache/arrow-datafusion/blob/master/datafusion/core/tests/memory_limit.rs so that should help guide the implementation -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
