CodingCat commented on code in PR #3569:
URL: https://github.com/apache/celeborn/pull/3569#discussion_r2646595242


##########
client/src/main/scala/org/apache/celeborn/client/LifecycleManager.scala:
##########
@@ -1041,7 +1043,7 @@ class LifecycleManager(val appUniqueId: String, val conf: 
CelebornConf) extends
                 // So if a barrier stage is getting reexecuted, previous 
stage/attempt needs to
                 // be cleaned up as it is entirely unusuable
                 if (determinate && !isBarrierStage && 
!isCelebornSkewShuffleOrChildShuffle(
-                    appShuffleId)) {
+                    appShuffleId) && !conf.clientShuffleEarlyDeletion) {

Review Comment:
   we cannot reuse the shuffle id when this feature is turned on, think about 
the following 
   
   stage B.0 depends on shuffle 1 which was written by stage A.0
   
   due to "too early deletion", shuffle 1 id is lost, we need to run A.1 , now 
, shuffle 1 has been deleted from "registered shuffle" , if we reuse 1 as the 
id and send to tasks of A.1, we will fall into errors like "shuffle not 
registered"



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to