Kontinuation commented on issue #17334: URL: https://github.com/apache/datafusion/issues/17334#issuecomment-3570067327
I'm also looking forward to the cooperative spilling feature. This is important for projects such as [Comet](https://github.com/apache/datafusion-comet) to implement [Photon-like memory management](https://people.eecs.berkeley.edu/~matei/papers/2022/sigmod_photon.pdf) and reduce the possibility of allocation failure (see related issue https://github.com/apache/datafusion-comet/issues/949). A similar DataFusion-based Spark accelerator project [Apache Auron](https://github.com/apache/auron) re-implemented memory-intensive operators such as sort, aggregation, and join [all by themselves](https://github.com/apache/auron/tree/11164c92c735dcf0204330a3e33621753005f3e9/native-engine/datafusion-ext-plans/src) to use [their own memory manager](https://github.com/apache/auron/blob/11164c92c735dcf0204330a3e33621753005f3e9/native-engine/auron-memmgr/src/lib.rs) and bypass the limitation of DataFusion's memory management API. A bit of history: the initial memory management [proposal](https://docs.google.com/document/d/1BT5HH-2sKq-Jxo51PNE6l9NNd_F-FyyYcyC3SKTnkIA/edit) and [implementation](https://github.com/apache/datafusion/pull/1526) did support cooperative spilling. However, a later [simplification](https://github.com/apache/datafusion/pull/4522) removed that feature. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
