CodingCat commented on code in PR #2358:
URL:
https://github.com/apache/incubator-celeborn/pull/2358#discussion_r1522471338
##########
client-spark/common/src/main/java/org/apache/spark/shuffle/celeborn/SortBasedPusher.java:
##########
@@ -169,6 +238,7 @@ public long pushData() throws IOException {
numPartitions);
mapStatusLengths[currentPartition].add(bytesWritten);
afterPush.accept(bytesWritten);
+ memoryThresholdManager.updateStats(offSet, offSet ==
pushBufferMaxSize);
Review Comment:
actually I tried to compare `pushBufferMaxSize / (1 + factor)` and I found
my unit test always failed in that case , the reason is that we actually
triggered line 251 `memoryThresholdManager.updateStats(offSet, true);` for
many times with a partially filled buffer....i.e.
"shouldPushedBytes/shouldPushedCount" is way smaller than (1 + factor) *
maxBufferSize.....of course you can tune factor, but IMHO, this add user's
cognitive burden and we are just paying ignorable cost to have a more accurate
measurement, why not?
I agree that `offSet == pushBufferMaxSize` is rarely true but to make the
precise decision about whether to increase the buffer, we still need tracking
"shouldPushedBytes" and "shouldPushedCount", therefore we still need a switch
in the method `updateStats` whether we want to update these two values
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]