andygrove opened a new pull request, #3938:
URL: https://github.com/apache/datafusion-comet/pull/3938

   ## Which issue does this PR close?
   
   Backport of #3924 to `branch-0.14`. Closes #3921.
   
   ## Rationale for this change
   
   When Comet executes a shuffle, it creates two native execution contexts that 
run concurrently within the same Spark task. Previously, each context created 
its own memory pool with the full per-task memory limit, effectively allowing 
2x the intended memory to be consumed. This caused significantly higher memory 
usage than expected, leading to OOM errors.
   
   ## What changes are included in this PR?
   
   Cherry-pick of #3924 with minor conflict resolution (added missing 
`parking_lot::Mutex` import that was not present on the 0.14 branch).
   
   Changes from the original PR:
   - Make `fair_unified` and `greedy_unified` memory pools task-shared, so a 
single pool instance is reused across all native execution contexts within the 
same Spark task
   - Fix a tracing bug where `total_reserved_for_thread()` and 
`unregister_and_total()` double-counted memory when multiple execution contexts 
shared the same pool `Arc`
   - Update tuning guide to document that both pool types are shared across 
execution contexts
   
   ## How are these changes tested?
   
   Same tests as #3924. Verified native code compiles on the 0.14 branch after 
cherry-pick.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to