andygrove opened a new pull request, #3938: URL: https://github.com/apache/datafusion-comet/pull/3938
## Which issue does this PR close? Backport of #3924 to `branch-0.14`. Closes #3921. ## Rationale for this change When Comet executes a shuffle, it creates two native execution contexts that run concurrently within the same Spark task. Previously, each context created its own memory pool with the full per-task memory limit, effectively allowing 2x the intended memory to be consumed. This caused significantly higher memory usage than expected, leading to OOM errors. ## What changes are included in this PR? Cherry-pick of #3924 with minor conflict resolution (added missing `parking_lot::Mutex` import that was not present on the 0.14 branch). Changes from the original PR: - Make `fair_unified` and `greedy_unified` memory pools task-shared, so a single pool instance is reused across all native execution contexts within the same Spark task - Fix a tracing bug where `total_reserved_for_thread()` and `unregister_and_total()` double-counted memory when multiple execution contexts shared the same pool `Arc` - Update tuning guide to document that both pool types are shared across execution contexts ## How are these changes tested? Same tests as #3924. Verified native code compiles on the 0.14 branch after cherry-pick. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
