RexXiong commented on code in PR #3018:
URL: https://github.com/apache/celeborn/pull/3018#discussion_r1912705474
##########
common/src/main/scala/org/apache/celeborn/common/CelebornConf.scala:
##########
@@ -3858,7 +3858,7 @@ object CelebornConf extends Logging {
.doc("If direct memory usage is less than this limit, worker will
resume.")
.version("0.2.0")
.doubleConf
- .createWithDefault(0.7)
+ .createWithDefault(0.3)
Review Comment:
Maybe we can add a new conf for pinnedMemoryToResume and keep exist conf
for directMemoryRatioToResume
##########
worker/src/main/java/org/apache/celeborn/service/deploy/worker/memory/MemoryManager.java:
##########
@@ -293,6 +297,18 @@ public boolean shouldEvict(boolean
aggressiveMemoryFileEvictEnabled, double evic
public ServingState currentServingState() {
long memoryUsage = getMemoryUsage();
+ long allocatedMemory;
+ if (networkMemoryAllocatorPooled) {
+ allocatedMemory = getAllocatedMemory();
+ } else {
+ allocatedMemory = memoryUsage;
+ }
+ // trigger resume
+ // CELEBORN-1792: resume should use pinnedDirectMemory instead of
usedDirectMemory
Review Comment:
Although we needn't change to pause state, it would be better to call trim
when netty direct memory used above
pausePushDataThreshold/pauseReplicateThreshold, WDYT?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]