CodingCat commented on code in PR #2358:
URL:
https://github.com/apache/incubator-celeborn/pull/2358#discussion_r1522471770
##########
client-spark/spark-2/src/main/java/org/apache/spark/shuffle/celeborn/SortBasedShuffleWriter.java:
##########
@@ -211,7 +211,7 @@ private void fastWrite0(scala.collection.Iterator iterator)
throws IOException {
private void doPush() throws IOException {
long start = System.nanoTime();
- pusher.pushData();
+ pusher.pushData(false);
Review Comment:
I didn't intent to backport this functionality to spark-2 , shall I ?
##########
client-spark/common/src/main/java/org/apache/spark/shuffle/celeborn/SortBasedPusher.java:
##########
@@ -169,6 +238,7 @@ public long pushData() throws IOException {
numPartitions);
mapStatusLengths[currentPartition].add(bytesWritten);
afterPush.accept(bytesWritten);
+ memoryThresholdManager.updateStats(offSet, offSet ==
pushBufferMaxSize);
Review Comment:
actually I tried to compare `pushBufferMaxSize / (1 + factor)` and I found
my unit test always failed in that case , the reason is that we actually
triggered line 251 `memoryThresholdManager.updateStats(offSet, true);` for
many times with a partially filled buffer....
I agree that `offSet == pushBufferMaxSize` is rarely true but to make the
precise decision about whether to increase the buffer, we still need tracking
"shouldPushedBytes" and "shouldPushedCount", therefore we still need a switch
in the method `updateStats` whether we want to update these two values
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]