ding-young commented on PR #17029: URL: https://github.com/apache/datafusion/pull/17029#issuecomment-3204025146
> For point 1: I vaguely remember in multi level merge, there is a logic to re-spill in-memory batches before the final merge, so that we don't have to special handlings for the mixed in-mem + spills case 🤔 If I’m not remembering it correctly, or we have missed some edges cases, we should do it (before the final merge, spill all in-mems first) for simplicity now. After taking another look, it seems that the in-mem + spill case only happens in the first-round merge. After that, everything gets spilled. So while it's true that this case may use more memory than the reservation, it doesn't seem to be the major case, and I’ll hold off on addressing it for now. > For point 2: I was expecting this should better be done after #15380, but it seems this optimization got stuck, I'll look into this issue in the next few days. I’ve opened a [new PR](https://github.com/apache/datafusion/pull/17163) to address it. Would appreciate it if you could take a look :) Besides that, just as a side note: I’m currently looking into a failing test case in this PR (memory validation). It’s related to `StringViewArray`, and I’m digging into why `get_array_memory_size` and `get_sliced_size` are so different even after running `gc()` before spilling. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org