Re: [PR] [CELEBORN-1309] Support adaptive management of memory threshold for SortBasedWriter [incubator-celeborn]

via GitHub Tue, 12 Mar 2024 21:05:51 -0700


CodingCat commented on code in PR #2358:
URL: 
https://github.com/apache/incubator-celeborn/pull/2358#discussion_r1522471338



##########
client-spark/common/src/main/java/org/apache/spark/shuffle/celeborn/SortBasedPusher.java:
##########
@@ -169,6 +238,7 @@ public long pushData() throws IOException {
                   numPartitions);
           mapStatusLengths[currentPartition].add(bytesWritten);
           afterPush.accept(bytesWritten);
+          memoryThresholdManager.updateStats(offSet, offSet == 
pushBufferMaxSize);

Review Comment:
   actually I tried to compare `pushBufferMaxSize / (1 + factor)` and I found 
my unit test always failed in that case , the reason is that we actually 
triggered  line 251 `memoryThresholdManager.updateStats(offSet, true);` for 
many times with a partially filled buffer....i.e. 
"shouldPushedBytes/shouldPushedCount" is way smaller than (1 + factor) * 
maxBufferSize.....of course you can tune factor, but IMHO, this add user's 
cognitive burden and we are just paying ignorable cost to have a more accurate 
measurement, why not?
   
   I agree that `offSet == pushBufferMaxSize` is rarely true but to make the 
precise decision about whether to increase the buffer, we still need tracking 
"shouldPushedBytes" and "shouldPushedCount", therefore we still need a switch 
in the method `updateStats` whether we want to update these two values 
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [CELEBORN-1309] Support adaptive management of memory threshold for SortBasedWriter [incubator-celeborn]

Reply via email to