alamb commented on issue #15028: URL: https://github.com/apache/datafusion/issues/15028#issuecomment-3298250229
@rluvaton -- I reviewed this issue and agree that the sort problem seems to have been solved, so I'll close the issue. Nice work! > and it failed with this (no sort this time after always spilling last level) > > ``` > Resources exhausted: Additional allocation failed with top memory consumers (across reservations) as: > ParquetSink(ArrowColumnWriter)#6(can spill: false) consumed 95.0 MB, peak 95.0 MB, > ParquetSink(ArrowColumnWriter)#9(can spill: false) consumed 4.2 MB, peak 4.2 MB, > ParquetSink(ArrowColumnWriter)#10(can spill: false) consumed 204.7 KB, peak 204.7 KB, > ParquetSink(ArrowColumnWriter)#5(can spill: false) consumed 95.2 KB, peak 95.2 KB, > ParquetSink(ArrowColumnWriter)#7(can spill: false) consumed 256.0 B, peak 256.0 B, > ParquetSink(ArrowColumnWriter)#8(can spill: false) consumed 256.0 B, peak 256.0 B, > ParquetSink(SerializedFileWriter)#4(can spill: false) consumed 0.0 B, peak 0.0 B. > ``` Given this description, it seems like this may be related to memory usage while writing to parquet and it is not clear it is a bug or just that DataFusion requires more memory to write 8 files in parallel than it was given I suggest we file a new ticket for optimizing the memory usage for this case (writing to parquet) if that is imporatant for anyone. There are some hints on memory tuning here: https://datafusion.apache.org/user-guide/configs.html#memory-limited-queries (basically set `target_partitions` to something lower) that might also help -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
