advancedxy commented on issue #1840:
URL: 
https://github.com/apache/incubator-uniffle/issues/1840#issuecomment-2200253482

   > Could you help review this? @advancedxy I want to go on to finish this 
feature.
   
   Thanks for ping me. I maybe able to review your design and PRs this week. 
And by the way, I haven't followed how shuffle writer failure triggered a stage 
recompute, I may be wrong about that.
   
   Some inline comments.
   > task will ask shuffle manager whether throwing spark's 
fetchFailedException.
   At this time, shuffle manager will check whether the task failure reaches 
the spark max failure threshold. Once this condition is satisfied, this will 
make task throw and fail.
   
   I don’t think this is correct per se? ShuffleWriteFailures should not 
trigger a`FetchFailedException` from executor side which might mess around 
Spark's driver logic. The shuffle write stage doesn't involve a shuffle fetch 
if it's not read from another shuffle stage. I think a more appropriate 
approach would be that the shuffle writer reports its write failures and the 
shuffle manager decides whether to retry the whole stage or simply relaunch 
tasks in the same stage with different shuffler assignments. If it decides to 
retry the whole stage, it might need to do something hacky to fail the current 
shuffle write stage  fast and unregister all its map outputs. This logic is 
similar like how Spark handles its barrier tasks. However, it might be hard to 
simulate that case from Uniffle's perspective.
   
   > Write and fetch failures won't happen at the same time, that's ensured by 
the spark, that's not pipelined data flow.
   
   it might be as long as there's one stage that reading from Uniffle's shuffle 
and writing shuffle data to uniffle too. There's only one instance of exception 
per task, though.  So I think the shuffle manager should be aware of that when 
receiving shuffle fetch failures and write failures.
   
   > So the two phase deletion should be introduced to solve the problem of 
data visibility and delete speed, that splits the deletion into rename sync and 
delete async 2 phases.
   
   Hmm, we don't encode stage attempt number in the output path? If the stage 
attempt number is encoded, I think we can just leave the shuffle data as it is 
and put it into a asyn delete queue if necessary.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to