Re: [PR] [CELEBORN-1496] Differentiate map results with only different stageAttemptId [celeborn]

via GitHub Wed, 10 Jul 2024 14:09:23 -0700


mridulm commented on code in PR #2609:
URL: https://github.com/apache/celeborn/pull/2609#discussion_r1673015879



##########
client-spark/spark-2/src/main/java/org/apache/spark/shuffle/celeborn/SparkUtils.java:
##########
@@ -136,6 +136,11 @@ public static int celebornShuffleId(
     }
   }
 
+  public static int getMapAttemptNumber(TaskContext context) {
+    assert (context.stageAttemptNumber() < (1 << 15) && 
context.attemptNumber() < (1 << 16));
+    return (context.stageAttemptNumber() << 16) | context.attemptNumber();
+  }
+

Review Comment:
   Do we have a reproducible test where the issue is surfaced ? That is, some 
test which fails and is expected to work ? We can explore what the fix is once 
we have a reproducible test.
   The test in this PR is only checking for 
`spark.stage.maxConsecutiveAttempts` being below 32k - and I am not sure what 
is the general behavior expected in the test for #2611 is.
   
   To add, when throwsFetchFailure is enabled, for each stage-attempt, a 
different celeborn shuffle id is used (all of which map to the same spark 
shuffle id). The source of confusion could be the use of 'shuffleId' name for 
the variable - within celeborn it need not map to spark's shuffle id.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [CELEBORN-1496] Differentiate map results with only different stageAttemptId [celeborn]

Reply via email to