zhztheplayer commented on code in PR #8132:
URL: https://github.com/apache/incubator-gluten/pull/8132#discussion_r1883368768
##########
gluten-core/src/main/java/org/apache/gluten/memory/memtarget/TreeMemoryTargets.java:
##########
@@ -118,7 +125,29 @@ public long borrow(long size) {
return 0;
}
ensureFreeCapacity(size);
- return borrow0(Math.min(freeBytes(), size));
+ long requiredSize = Math.min(freeBytes(), size);
+ long granted = borrow0(requiredSize);
+
+ // If isDynamicCapacity is true, which means it is controlled by vanilla
Spark,
+ // and if the granted memory is less than the required size, it may be
because
+ // this task holds memory exceeding maxMemoryPerTask. Therefore, we
actively retry
+ // spilling all memory. After this, if there is still not enough memory
acquired,
+ // it should result in an OOM.
+ if (granted < requiredSize && isDynamicCapacity) {
+ LOGGER.info(
+ "Exceed Spark perTaskLimit with maxTaskSizeDynamic when "
+ + "require:{} got:{}, try spill all.",
+ requiredSize,
+ granted);
+ long spilled = TreeMemoryTargets.spillTree(this, Long.MAX_VALUE);
+ long remain = requiredSize - granted;
+ if (spilled > remain) {
+ granted += borrow0(remain);
+ } else {
+ // OOM
+ }
+ }
+ return granted;
Review Comment:
hi @kecookier
Can we add a individual memory target for doing the retries?
Something like
```java
public static class RetryOnOomMemoryTarget extends MemoryTarget {
private final TreeMemoryTarget target;
@Override
public long borrow(long size) {
final long granted = target.borrow(size);
if (granted >= size) {
return granted;
}
// Granted < size. Spill the underlying memory target then retry
borrowing.
final long remaining = size - granted;
TreeMemoryTargets.spillTree(target, Long.MAX_VALUE);
final long granted2 = target.borrow(remaining);
return granted + granted2;
}
```
I can help do a further benchmark too see if chaining another memory target
hurts the overall performance. Thought we can make a try like this anyway.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]