mbutrovich commented on issue #20728:
URL: https://github.com/apache/datafusion/issues/20728#issuecomment-4007043460

   Thanks for the quick response @EmilyMatt!
   
   You're right that `try_grow` is called once per `sort_batch_stream` 
invocation, not per chunk. The issue is that `sort_batch_stream` is called 
[once per in-memory batch in the multi-batch merge 
path](https://github.com/apache/datafusion/blob/13cebf8432a3c649ebc3426933753e159b2b7b9a/datafusion/physical-plan/src/sorts/sort.rs#L687),
 and the old code used `drop(reservation)` — which for most `MemoryPool` 
implementations is very cheap (just decrementing a counter, no pool-level 
reacquisition needed). The new code does `reservation.free()` + 
`reservation.try_grow(total_sorted_size)`, which is two full pool roundtrips 
where there were previously zero.
   
   In Comet's case each pool call crosses JNI into Spark's 
`TaskMemoryManager.acquireExecutionMemory()`, which is expensive — but this 
would affect any `MemoryPool` implementation where `try_grow`/`shrink` do real 
work (coordination with an external memory manager, logging, etc.).
   
   We confirmed this by replacing `free()` + `try_grow()` with `try_resize()`, 
which only interacts with the pool for the delta (typically near-zero since 
sorted output ≈ input size). The `acquireMemory` samples dropped from 22,356 to 
451 (back to DF51 levels), and wall-clock time improved. I should have full 
benchmarks shortly.
   
   In the meantime I have a draft PR up already: #20729


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to