2010YOUY01 commented on PR #14644:
URL: https://github.com/apache/datafusion/pull/14644#issuecomment-2658837563
> I had a hard time making DataFusion Comet work on cloud instances with 4GB
memory per CPU core, partially because DataFusion is very likely to allocates
more memory than reserved and run into OOM, or run into various kinds of memory
reservation failures.
Yes, this feature is quite bug-prone, perhaps we should mark it as
experimental to prevent someone to use it in production. Thank you so much for
the efforts.
Here are my thoughts on the changes
> 1. Don't `try_collect` the result of merging all at once. We consume the
merged batches one after another and reserve memory for each batch. Once the
reservation fails we switch to "spill mode" and write all future batches into
the spill file. This resolves the 2X memory allocation problem ("the second
problem") reported by [Memory account not adding up in SortExec
#10073](https://github.com/apache/datafusion/issues/10073), as well as this
comment: [External sorting not working for (maybe only for string columns??)
#12136
(comment)](https://github.com/apache/datafusion/issues/12136#issuecomment-2642135559)
> 3. Reserves more memory for ingested batches to leave some room for
merging. This PR reserves 2X memory for each batch, this works for most of the
queries in sort-tpch benchmark (all except Q5 and Q8). User still have to
configure `sort_spill_reservation_bytes` when memory reserved is not big enough
for merging. I don't think it is a good change but this is the only solution I
can think of to compensate for the extra memory usage for the row
representation of sorted columns.
The 2X memory problem is: specifying a query to run under 100M memory, the
measured physical memory is 200M. (though the reality is even worse than 2X 🤦🏼
, see https://github.com/apache/datafusion/pull/14142)
This is caused by, when the first time OOM happens:
- there are already 1X batches in memory
- Then it will be sorted and merged at once, meaning original batches should
hold until all sorted runs are generated (now memory footprint is already 2X)
- In the merging phase, additional col->row conversion consumes some extra
memory (2X+)
I can get the high-level idea of the more conservative memory accounting in
point 3, it is used to also account for merged batches, but I get lost in the
memory estimation details in the implementation (especially is this 2X
estimation amplification only for merged batches, or also intermediate `Row`s),
could you explain with a concrete example how everything is calculated? (for
example, there is a `ExternalSorter` with 100M memory budget, and it will
consume 10 batches, each with 20M size, how memory estimation is calculated in
each step)
> 2. `shrink_to_fit` every sorted batches reduce the memory footprint of
sorted batches, otherwise sorted string arrays may take 2X the original space
in the worst case, due to exponential growth of `MutableBuffer` for storing
variable length binary values. `shrink_to_fit` is a no-op for primitive-type
columns returned by `take_arrays` since they already have the right capacity,
and benchmarking showed no significant performance regression for non-primitive
types such as string arrays, so I think it is a good change. This resolves "the
first problem" reported by [Memory account not adding up in SortExec
#10073](https://github.com/apache/datafusion/issues/10073).
I think the buffer resizing mechanism is not doubling each time, the default
policy will allocate new constant size buffers
https://docs.rs/arrow-array/54.1.0/src/arrow_array/builder/generic_bytes_view_builder.rs.html#120-122,
so this change might not help
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]