Re: [PR] feat: Various improvements to memory pool configuration, logging, and documentation [datafusion-comet]

via GitHub Fri, 17 Oct 2025 08:18:11 -0700


andygrove commented on code in PR #2538:
URL: https://github.com/apache/datafusion-comet/pull/2538#discussion_r2428961400



##########
docs/source/user-guide/latest/tuning.md:
##########
@@ -114,79 +88,36 @@ Workarounds for this problem include:
 
 ## Advanced Memory Tuning
 
-### Configuring Off-Heap Memory Pools
+### Configuring Comet Memory Pools
 
 Comet implements multiple memory pool implementations. The type of pool can be 
specified with `spark.comet.exec.memoryPool`.
 
-The valid pool types for off-heap mode are:
+The valid pool types are:
 
-- `fair_unified` (default when `spark.memory.offHeap.enabled=true` is set)
+- `fair_unified_global` (default when `spark.memory.offHeap.enabled=true` is 
set)
+- `fair_unified`
 - `greedy_unified`
 
-Both of these pools share off-heap memory between Spark and Comet. This 
approach is referred to as
+All of these pools share off-heap memory between Spark and Comet. This 
approach is referred to as
 unified memory management. The size of the pool is specified by 
`spark.memory.offHeap.size`.
 
-The `greedy_unified` pool type implements a greedy first-come first-serve 
limit. This pool works well for queries that do not
-need to spill or have a single spillable operator.
+Comet's memory accounting isn't 100% accurate and this can result in Comet 
using more memory than it reserves, 
+leading to out-of-memory exceptions. To work around this issue, it is possible 
to 
+set `spark.comet.exec.memoryPool.fraction` to a value less than `1.0` to 
restrict the amount of memory that can be 
+reserved by Comet.
 
-The `fair_unified` pool type prevents operators from using more than an even 
fraction of the available memory
+The `fair_unified` pool types prevents operators from using more than an even 
fraction of the available memory
 (i.e. `pool_size / num_reservations`). This pool works best when you know 
beforehand
 the query has multiple operators that will likely all need to spill. Sometimes 
it will cause spills even
 when there is sufficient memory in order to leave enough memory for other 
operators.
 
-### Configuring On-Heap Memory Pools
-
-```{warning}
-Support for on-heap memory pools is deprecated and will be removed from a 
future release.
-```
-
-When running in on-heap mode, Comet will use its own dedicated memory pools 
that are not shared with Spark.
+`fair_unified_global` allows any task to use the full off-heap memory pool.

Review Comment:
   Thanks. I have updated this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] feat: Various improvements to memory pool configuration, logging, and documentation [datafusion-comet]

Reply via email to