2010YOUY01 commented on PR #14644: URL: https://github.com/apache/datafusion/pull/14644#issuecomment-2658837563
> I had a hard time making DataFusion Comet work on cloud instances with 4GB memory per CPU core, partially because DataFusion is very likely to allocates more memory than reserved and run into OOM, or run into various kinds of memory reservation failures. Yes, this feature is quite bug-prone, perhaps we should mark it as experimental to prevent someone to use it in production. Thank you so much for the efforts. Here are my thoughts on the changes > 1. Don't `try_collect` the result of merging all at once. We consume the merged batches one after another and reserve memory for each batch. Once the reservation fails we switch to "spill mode" and write all future batches into the spill file. This resolves the 2X memory allocation problem ("the second problem") reported by [Memory account not adding up in SortExec #10073](https://github.com/apache/datafusion/issues/10073), as well as this comment: [External sorting not working for (maybe only for string columns??) #12136 (comment)](https://github.com/apache/datafusion/issues/12136#issuecomment-2642135559) > 3. Reserves more memory for ingested batches to leave some room for merging. This PR reserves 2X memory for each batch, this works for most of the queries in sort-tpch benchmark (all except Q5 and Q8). User still have to configure `sort_spill_reservation_bytes` when memory reserved is not big enough for merging. I don't think it is a good change but this is the only solution I can think of to compensate for the extra memory usage for the row representation of sorted columns. The 2X memory problem is: specifying a query to run under 100M memory, the measured physical memory is 200M. (though the reality is even worse than 2X 🤦🏼 , see https://github.com/apache/datafusion/pull/14142) This is caused by, when the first time OOM happens: - there are already 1X batches in memory - Then it will be sorted and merged at once, meaning original batches should hold until all sorted runs are generated (now memory footprint is already 2X) - In the merging phase, additional col->row conversion consumes some extra memory (2X+) I can get the high-level idea of the more conservative memory accounting in point 3, it is used to also account for merged batches, but I get lost in the memory estimation details in the implementation (especially is this 2X estimation amplification only for merged batches, or also intermediate `Row`s), could you explain with a concrete example how everything is calculated? (for example, there is a `ExternalSorter` with 100M memory budget, and it will consume 10 batches, each with 20M size, how memory estimation is calculated in each step) > 2. `shrink_to_fit` every sorted batches reduce the memory footprint of sorted batches, otherwise sorted string arrays may take 2X the original space in the worst case, due to exponential growth of `MutableBuffer` for storing variable length binary values. `shrink_to_fit` is a no-op for primitive-type columns returned by `take_arrays` since they already have the right capacity, and benchmarking showed no significant performance regression for non-primitive types such as string arrays, so I think it is a good change. This resolves "the first problem" reported by [Memory account not adding up in SortExec #10073](https://github.com/apache/datafusion/issues/10073). I think the buffer resizing mechanism is not doubling each time, the default policy will allocate new constant size buffers https://docs.rs/arrow-array/54.1.0/src/arrow_array/builder/generic_bytes_view_builder.rs.html#120-122, so this change might not help -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org