2010YOUY01 commented on PR #16192: URL: https://github.com/apache/datafusion/pull/16192#issuecomment-2951468269
> @2010YOUY01 Hi, I’ve been struggling a bit with tracking peak memory in SPM step, and I was wondering if I could ask for some help. > > ### 1. Can we add the memory for converted (row) batches to previous `peak_mem_used`? > Since `ExternalSorter` creates `SortPreservingMergeStream` for 2nd step, SPM, so I tried updating the peak memory metric inside `maybe_poll_stream` in `SortPreservingMergeStream` (which internally calls `poll_next` where `convert_batch` is done, and pushes batches into a `BatchBuilder`). But here’s my concern: if we keep adding the new reservation from this second step to the previous peak memory value, we might be overestimating. That’s because by the time the second step runs, some batches from the first step might have already been dropped. So, summing them might inflate the reported peak memory. I think the correct way to implement it is for each input batch, do `peak_mem = max(peak_mem, sum_all_related_reservations_current_size())`, it should not have any overestimate, otherwise this should be a bug. >I tried printing the total reserved size from the global memory pool manually (with tons of `println`) during execution, and it seems like there was a difference between the first and second steps, but it didn’t seem as large as the total size of all converted batches combined. The reason might be we're currently using a hack: when buffering batches use 2X memory size for reservation, so that the SPM step won't run out of memory. https://github.com/apache/datafusion/blob/0f83c1d233499e80fd9b0baf89dd624099c1d1ba/datafusion/physical-plan/src/sorts/sort.rs#L783 To correct my previous example: ```text Let's say we have a SortExec (with 1 partition) to handle 10 input batches, the execution process is: Reserve 10MB for merge_reservation() // peak_mem -- 0M Read input and buffer 10 batches (5MB each) Here we use 2X estimate so that later SPM step will have enough extra memory for `Row`s // peak_mem -- 100M SPM step (each batch's converted row format is 2MB) // peak_mem -- 70M ``` > ### 2. Parent Operator's memory reservation > Also, when the parent operator (e.g., `SortPreservingMergeExec`) executes, the reservation created by the earlier `SortExec` is not yet released. In this case, should `SortPreservingMergeExec` only track the peak memory of its own reservation? > I think the buffering step will always temporary drop the reservation first, and later let `SPM` reserve that back again 🤔 Here is a example that temporarily drop the reservation in sort-buffering step https://github.com/apache/datafusion/blob/0f83c1d233499e80fd9b0baf89dd624099c1d1ba/datafusion/physical-plan/src/sorts/sort.rs#L669 If you notice anything not released it might be a bug. Due to this 2x estimation hack in the implementation, I now think tracking peak memory in the SPM step is unnecessary. The current implementation, with more tests, should be good to go. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org