Re: [PR] [#134] improvement(spark3): Use taskId and attemptNo as taskAttemptId [incubator-uniffle]

via GitHub Fri, 16 Feb 2024 06:25:30 -0800


EnricoMi commented on code in PR #1529:
URL: 
https://github.com/apache/incubator-uniffle/pull/1529#discussion_r1492539414



##########
client-spark/spark3/src/main/java/org/apache/spark/shuffle/RssShuffleManager.java:
##########
@@ -518,6 +523,33 @@ public <K, V> ShuffleWriter<K, V> getWriter(
         shuffleHandleInfo);
   }
 
+  /**
+   * Provides a task attempt id that is unique for a shuffle stage.
+   *
+   * <p>We are not using context.taskAttemptId() here as this is a 
monotonically increasing number
+   * that is unique across the entire Spark app which can reach very large 
numbers, which can
+   * practically reach LONG.MAX_VALUE. That would overflow the bits in the 
block id.
+   *
+   * <p>Here we use the map index or task id, appended by the attempt number 
per task. The map index
+   * is limited by the number of partitions of a stage. The attempt number per 
task is limited /

Review Comment:
   That is a bit surprising, but looking at the relevant code, the max failure 
is not considered when resubmitting a task as speculative:
   
https://github.com/apache/spark/blob/2abd3a2f445e86337ad94da19f301cb2b8bc232f/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala#L1226-L1227
   
   I will account for that!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [#134] improvement(spark3): Use taskId and attemptNo as taskAttemptId [incubator-uniffle]

Reply via email to