rluvaton commented on issue #15028: URL: https://github.com/apache/datafusion/issues/15028#issuecomment-3288929953
I've verified that with latest main this still fail BUT I already thought this might happen which is why I have this todo in the code: https://github.com/apache/datafusion/blob/af7587b8660560791417f0fd5a87521e3478e3d2/datafusion/physical-plan/src/sorts/multi_level_merge.rs#L191 There are couple of possible solutions: 1. Limit the final merge degree, possible, don't like it as it can lead to under-utilization. 2. always spill last degree (if we are merging spill files and not existing streams), possible, don't like it as well due to we write the whole data again which can be very bad for performance 3. make the last merge spillable so if needed it can spill - this will only work if you can trigger spills which you can't currently. But when changed locally to always spill last degree, this still fail but **not because of sort**: also changed the reproduction code memory pool: ```rust let tracked_fair = TrackConsumersPool::new( FairSpillPool::new(100 * 1024 * 1024), // 2MB limit NonZeroUsize::new(1000).unwrap(), ); ``` and it failed with this (no sort this time after always spilling last level) ``` Resources exhausted: Additional allocation failed with top memory consumers (across reservations) as: ParquetSink(ArrowColumnWriter)#6(can spill: false) consumed 95.0 MB, peak 95.0 MB, ParquetSink(ArrowColumnWriter)#9(can spill: false) consumed 4.2 MB, peak 4.2 MB, ParquetSink(ArrowColumnWriter)#10(can spill: false) consumed 204.7 KB, peak 204.7 KB, ParquetSink(ArrowColumnWriter)#5(can spill: false) consumed 95.2 KB, peak 95.2 KB, ParquetSink(ArrowColumnWriter)#7(can spill: false) consumed 256.0 B, peak 256.0 B, ParquetSink(ArrowColumnWriter)#8(can spill: false) consumed 256.0 B, peak 256.0 B, ParquetSink(SerializedFileWriter)#4(can spill: false) consumed 0.0 B, peak 0.0 B. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
