ding-young commented on PR #17029:
URL: https://github.com/apache/datafusion/pull/17029#issuecomment-3204025146

   > For point 1: I vaguely remember in multi level merge, there is a logic to 
re-spill in-memory batches before the final merge, so that we don't have to 
special handlings for the mixed in-mem + spills case 🤔 If I’m not remembering 
it correctly, or we have missed some edges cases, we should do it (before the 
final merge, spill all in-mems first) for simplicity now.
   
   After taking another look, it seems that the in-mem + spill case only 
happens in the first-round merge. After that, everything gets spilled. So while 
it's true that this case may use more memory than the reservation, it doesn't 
seem to be the major case, and I’ll hold off on addressing it for now.
   
   > For point 2: I was expecting this should better be done after #15380, but 
it seems this optimization got stuck, I'll look into this issue in the next few 
days.
   
   I’ve opened a [new PR](https://github.com/apache/datafusion/pull/17163) to 
address it. Would appreciate it if you could take a look :) 
   
   Besides that, just as a side note:  I’m currently looking into a failing 
test case in this PR (memory validation). It’s related to `StringViewArray`, 
and I’m digging into why `get_array_memory_size` and `get_sliced_size` are so 
different even after running `gc()` before spilling.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to