Yohahaha commented on code in PR #5439:
URL: https://github.com/apache/incubator-gluten/pull/5439#discussion_r1604684880
##########
gluten-core/src/main/scala/org/apache/gluten/GlutenPlugin.scala:
##########
@@ -164,20 +165,71 @@ private[gluten] class GlutenDriverPlugin extends
DriverPlugin with Logging {
// task slots
val taskSlots = SparkResourceUtil.getTaskSlots(conf)
- // Optimistic off-heap sizes, assuming all storage memory can be borrowed
into execution memory
- // pool, regardless of Spark option spark.memory.storageFraction.
- val offHeapSize = conf.getSizeAsBytes(GlutenConfig.GLUTEN_OFFHEAP_SIZE_KEY)
+ var onHeapSize: Long =
+ if (conf.contains(GlutenConfig.GLUTEN_ONHEAP_SIZE_KEY)) {
+ conf.getSizeAsBytes(GlutenConfig.GLUTEN_ONHEAP_SIZE_KEY)
+ } else {
+ // 1GB default
+ 1024 * 1024 * 1024
+ }
+
+ // If dynamic off-heap sizing is enabled, the off-heap size is calculated
based on the on-heap
+ // size. Otherwise, the off-heap size is set to the value specified by the
user (if any).
+ // Note that this means that we will IGNORE the off-heap size specified by
the user if the
+ // dynamic off-heap feature is enabled.
+ var offHeapSize: Long =
+ if (conf.getBoolean(GlutenConfig.GLUTEN_DYNAMIC_OFFHEAP_SIZING_ENABLED,
false)) {
+ // Since when dynamic off-heap sizing is enabled, we commingle on-heap
+ // and off-heap memory, we set the off-heap size to the usable on-heap
size. We will
+ // size it with a memory fraction, which can be aggressively set, but
the default
+ // is using the same way that Spark sizes on-heap memory:
+ //
+ // spark.gluten.memory.dynamic.offHeap.sizing.memory.fraction *
+ // (spark.executor.memory - 300MB).
+ //
+ // We will be careful to use the same configuration settings as Spark
to ensure
+ // that we are sizing the off-heap memory in the same way as Spark
sizes on-heap memory.
+ // The 300MB value, unfortunately, is hard-coded in Spark code.
+ ((onHeapSize - (300 * 1024 * 1024)) *
+
conf.getDouble(GlutenConfig.GLUTEN_DYNAMIC_OFFHEAP_SIZING_MEMORY_FRACTION,
0.6d)).toLong
+ } else if (conf.contains(GlutenConfig.GLUTEN_OFFHEAP_SIZE_KEY)) {
+ // Optimistic off-heap sizes, assuming all storage memory can be
borrowed into execution
+ // memory pool, regardless of Spark option
spark.memory.storageFraction.
+ conf.getSizeAsBytes(GlutenConfig.GLUTEN_OFFHEAP_SIZE_KEY)
+ } else {
+ // Default Spark Value.
+ 0L
+ }
+
conf.set(GlutenConfig.GLUTEN_OFFHEAP_SIZE_IN_BYTES_KEY,
offHeapSize.toString)
+ conf.set(GlutenConfig.GLUTEN_OFFHEAP_SIZE_KEY, offHeapSize.toString)
+
val offHeapPerTask = offHeapSize / taskSlots
conf.set(GlutenConfig.GLUTEN_TASK_OFFHEAP_SIZE_IN_BYTES_KEY,
offHeapPerTask.toString)
- // Pessimistic off-heap sizes, with the assumption that all non-borrowable
storage memory
- // determined by spark.memory.storageFraction was used.
- val fraction = 1.0d - conf.getDouble("spark.memory.storageFraction", 0.5d)
- val conservativeOffHeapPerTask = (offHeapSize * fraction).toLong /
taskSlots
- conf.set(
- GlutenConfig.GLUTEN_CONSERVATIVE_TASK_OFFHEAP_SIZE_IN_BYTES_KEY,
- conservativeOffHeapPerTask.toString)
+ // If we are using dynamic off-heap sizing, we should also enable off-heap
memory
+ // officially.
+ if (conf.getBoolean(GlutenConfig.GLUTEN_DYNAMIC_OFFHEAP_SIZING_ENABLED,
false)) {
+ conf.set(GlutenConfig.GLUTEN_OFFHEAP_ENABLED, "true")
Review Comment:
I'm a bit confused here, even user set `spark.memory.offHeap.enabled=false`,
it will still be enabled when dynamic off-heap sizing enabled?
I'm not sure whether Runtime#freeMemory could account the memory allocated
from Unsafe class, which means if we re-enable spark off-heap memory, this
portion of memory will not be accounted for within JVM. In another word,
Runtime#freeMemory does not account Spark off-heap memory usage.
I try ask GPT and the answer is
```
The memory allocated through Unsafe#allocateMemory is not accounted for by
the Runtime#freeMemory method.
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]