Re: [PR] [CELEBORN-894] End to End Integrity Checks [celeborn]

via GitHub Thu, 12 Jun 2025 05:14:55 -0700


RexXiong commented on code in PR #3261:
URL: https://github.com/apache/celeborn/pull/3261#discussion_r2141554026



##########
client/src/main/java/org/apache/celeborn/client/read/CelebornInputStream.java:
##########
@@ -721,6 +740,36 @@ public synchronized void close() {
       }
     }
 
+    void validateIntegrity() throws IOException {
+      if (integrityChecked || !shuffleIntegrityCheckEnabled) {
+        return;
+      }
+
+      if (readSkewPartitionWithoutMapRange) {
+        shuffleClient.readReducerPartitionEnd(

Review Comment:
   The current implementation mixes the concepts of 
numberOfSubPartitions/currentIndexOfSubPartition and startMapIndex/endMapIndex 
across different method signatures, such as readReducerPartitionEnd(int 
startMapIndex, int endMapIndex) and readPartition(int startMapIndex, int 
endMapIndex). Meanwhile, CelebornInputStreamImpl utilizes both. To ensure 
consistency, IMO we should use (int startMapIndex, int endMapIndex), as seen in 
CelebornShuffleReader.



##########
common/src/main/scala/org/apache/celeborn/common/CelebornConf.scala:
##########
@@ -5356,6 +5358,14 @@ object CelebornConf extends Logging {
       .bytesConf(ByteUnit.BYTE)
       .createWithDefaultString("512k")
 
+  val CLIENT_SPARK_SHUFFLE_INTEGRITY_CHECK_ENABLED: ConfigEntry[Boolean] =

Review Comment:
   Maybe better to revert back to using 
`celeborn.client.shuffle.integrityCheck.enabled` , as the configuration use in 
many place not only impact spark, As a compromise, just comment this 
configuration is only affects spark 



##########
client/src/main/java/org/apache/celeborn/client/ShuffleClient.java:
##########
@@ -203,13 +203,19 @@ public abstract int mergeData(
 
   public abstract void pushMergedData(int shuffleId, int mapId, int attemptId) 
throws IOException;
 
-  // Report partition locations written by the completed map task of 
ReducePartition Shuffle Type
-  public abstract void mapperEnd(int shuffleId, int mapId, int attemptId, int 
numMappers)
+  // Report partition locations written by the completed map task of 
MapPartition Shuffle Type.

Review Comment:
   Why change ReducePartition to MapPartition?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [CELEBORN-894] End to End Integrity Checks [celeborn]

Reply via email to