zhztheplayer commented on code in PR #8132:
URL: https://github.com/apache/incubator-gluten/pull/8132#discussion_r1883368768


##########
gluten-core/src/main/java/org/apache/gluten/memory/memtarget/TreeMemoryTargets.java:
##########
@@ -118,7 +125,29 @@ public long borrow(long size) {
         return 0;
       }
       ensureFreeCapacity(size);
-      return borrow0(Math.min(freeBytes(), size));
+      long requiredSize = Math.min(freeBytes(), size);
+      long granted = borrow0(requiredSize);
+
+      // If isDynamicCapacity is true, which means it is controlled by vanilla 
Spark,
+      // and if the granted memory is less than the required size, it may be 
because
+      // this task holds memory exceeding maxMemoryPerTask. Therefore, we 
actively retry
+      // spilling all memory. After this, if there is still not enough memory 
acquired,
+      // it should result in an OOM.
+      if (granted < requiredSize && isDynamicCapacity) {
+        LOGGER.info(
+            "Exceed Spark perTaskLimit with maxTaskSizeDynamic when "
+                + "require:{} got:{}, try spill all.",
+            requiredSize,
+            granted);
+        long spilled = TreeMemoryTargets.spillTree(this, Long.MAX_VALUE);
+        long remain = requiredSize - granted;
+        if (spilled > remain) {
+          granted += borrow0(remain);
+        } else {
+          // OOM
+        }
+      }
+      return granted;

Review Comment:
   hi @kecookier, thank you for working on this.
   
   Can we add a individual memory target for doing the retries?
   
   Something like
   
   ```java
     public static class RetryOnOomMemoryTarget extends MemoryTarget {
       private final TreeMemoryTarget target;
   
       @Override
       public long borrow(long size) {
         final long granted = target.borrow(size);
         if (granted >= size) {
           return granted;
         }
         // Granted < size. Spill the underlying memory target then retry 
borrowing.
         final long remaining = size - granted;
         TreeMemoryTargets.spillTree(target, Long.MAX_VALUE);
         final long granted2 = target.borrow(remaining);
         return granted + granted2;
       }
   ```
   
   I can help do a further benchmark too see if chaining another memory target 
hurts the overall performance. Thought we can make a try like this anyway.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to