Yohahaha commented on code in PR #5439:
URL: https://github.com/apache/incubator-gluten/pull/5439#discussion_r1606179989


##########
gluten-core/src/main/scala/org/apache/gluten/GlutenPlugin.scala:
##########
@@ -164,20 +165,71 @@ private[gluten] class GlutenDriverPlugin extends 
DriverPlugin with Logging {
     // task slots
     val taskSlots = SparkResourceUtil.getTaskSlots(conf)
 
-    // Optimistic off-heap sizes, assuming all storage memory can be borrowed 
into execution memory
-    // pool, regardless of Spark option spark.memory.storageFraction.
-    val offHeapSize = conf.getSizeAsBytes(GlutenConfig.GLUTEN_OFFHEAP_SIZE_KEY)
+    var onHeapSize: Long =
+      if (conf.contains(GlutenConfig.GLUTEN_ONHEAP_SIZE_KEY)) {
+        conf.getSizeAsBytes(GlutenConfig.GLUTEN_ONHEAP_SIZE_KEY)
+      } else {
+        // 1GB default
+        1024 * 1024 * 1024
+      }
+
+    // If dynamic off-heap sizing is enabled, the off-heap size is calculated 
based on the on-heap
+    // size. Otherwise, the off-heap size is set to the value specified by the 
user (if any).
+    // Note that this means that we will IGNORE the off-heap size specified by 
the user if the
+    // dynamic off-heap feature is enabled.
+    var offHeapSize: Long =
+      if (conf.getBoolean(GlutenConfig.GLUTEN_DYNAMIC_OFFHEAP_SIZING_ENABLED, 
false)) {
+        // Since when dynamic off-heap sizing is enabled, we commingle on-heap
+        // and off-heap memory, we set the off-heap size to the usable on-heap 
size. We will
+        // size it with a memory fraction, which can be aggressively set, but 
the default
+        // is using the same way that Spark sizes on-heap memory:
+        //
+        // spark.gluten.memory.dynamic.offHeap.sizing.memory.fraction *
+        //    (spark.executor.memory - 300MB).
+        //
+        // We will be careful to use the same configuration settings as Spark 
to ensure
+        // that we are sizing the off-heap memory in the same way as Spark 
sizes on-heap memory.
+        // The 300MB value, unfortunately, is hard-coded in Spark code.
+        ((onHeapSize - (300 * 1024 * 1024)) *
+          
conf.getDouble(GlutenConfig.GLUTEN_DYNAMIC_OFFHEAP_SIZING_MEMORY_FRACTION, 
0.6d)).toLong
+      } else if (conf.contains(GlutenConfig.GLUTEN_OFFHEAP_SIZE_KEY)) {
+        // Optimistic off-heap sizes, assuming all storage memory can be 
borrowed into execution
+        // memory pool, regardless of Spark option 
spark.memory.storageFraction.
+        conf.getSizeAsBytes(GlutenConfig.GLUTEN_OFFHEAP_SIZE_KEY)
+      } else {
+        // Default Spark Value.
+        0L
+      }
+
     conf.set(GlutenConfig.GLUTEN_OFFHEAP_SIZE_IN_BYTES_KEY, 
offHeapSize.toString)
+    conf.set(GlutenConfig.GLUTEN_OFFHEAP_SIZE_KEY, offHeapSize.toString)
+
     val offHeapPerTask = offHeapSize / taskSlots
     conf.set(GlutenConfig.GLUTEN_TASK_OFFHEAP_SIZE_IN_BYTES_KEY, 
offHeapPerTask.toString)
 
-    // Pessimistic off-heap sizes, with the assumption that all non-borrowable 
storage memory
-    // determined by spark.memory.storageFraction was used.
-    val fraction = 1.0d - conf.getDouble("spark.memory.storageFraction", 0.5d)
-    val conservativeOffHeapPerTask = (offHeapSize * fraction).toLong / 
taskSlots
-    conf.set(
-      GlutenConfig.GLUTEN_CONSERVATIVE_TASK_OFFHEAP_SIZE_IN_BYTES_KEY,
-      conservativeOffHeapPerTask.toString)
+    // If we are using dynamic off-heap sizing, we should also enable off-heap 
memory
+    // officially.
+    if (conf.getBoolean(GlutenConfig.GLUTEN_DYNAMIC_OFFHEAP_SIZING_ENABLED, 
false)) {
+      conf.set(GlutenConfig.GLUTEN_OFFHEAP_ENABLED, "true")

Review Comment:
   @ulysses-you thank you for the explanation.
   
   > if there is enough free onhep memory, then we can borrow(logical borrow) 
it to offheap side and prevent the onheap memory to be allocated.
   
   I got the core idea, but I dont understand why it can prevent the onheap 
memory allocation totally, would you give more explanation? thank you!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to