RexXiong commented on code in PR #3261:
URL: https://github.com/apache/celeborn/pull/3261#discussion_r2141554026
##########
client/src/main/java/org/apache/celeborn/client/read/CelebornInputStream.java:
##########
@@ -721,6 +740,36 @@ public synchronized void close() {
}
}
+ void validateIntegrity() throws IOException {
+ if (integrityChecked || !shuffleIntegrityCheckEnabled) {
+ return;
+ }
+
+ if (readSkewPartitionWithoutMapRange) {
+ shuffleClient.readReducerPartitionEnd(
Review Comment:
The current implementation mixes the concepts of
numberOfSubPartitions/currentIndexOfSubPartition and startMapIndex/endMapIndex
across different method signatures, such as readReducerPartitionEnd(int
startMapIndex, int endMapIndex) and readPartition(int startMapIndex, int
endMapIndex). Meanwhile, CelebornInputStreamImpl utilizes both. To ensure
consistency, IMO we should use (int startMapIndex, int endMapIndex), as seen in
CelebornShuffleReader.
##########
common/src/main/scala/org/apache/celeborn/common/CelebornConf.scala:
##########
@@ -5356,6 +5358,14 @@ object CelebornConf extends Logging {
.bytesConf(ByteUnit.BYTE)
.createWithDefaultString("512k")
+ val CLIENT_SPARK_SHUFFLE_INTEGRITY_CHECK_ENABLED: ConfigEntry[Boolean] =
Review Comment:
Maybe better to revert back to using
`celeborn.client.shuffle.integrityCheck.enabled` , as the configuration use in
many place not only impact spark, As a compromise, just comment this
configuration is only affects spark
##########
client/src/main/java/org/apache/celeborn/client/ShuffleClient.java:
##########
@@ -203,13 +203,19 @@ public abstract int mergeData(
public abstract void pushMergedData(int shuffleId, int mapId, int attemptId)
throws IOException;
- // Report partition locations written by the completed map task of
ReducePartition Shuffle Type
- public abstract void mapperEnd(int shuffleId, int mapId, int attemptId, int
numMappers)
+ // Report partition locations written by the completed map task of
MapPartition Shuffle Type.
Review Comment:
Why change ReducePartition to MapPartition?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]