advancedxy commented on issue #1840: URL: https://github.com/apache/incubator-uniffle/issues/1840#issuecomment-2200253482
> Could you help review this? @advancedxy I want to go on to finish this feature. Thanks for ping me. I maybe able to review your design and PRs this week. And by the way, I haven't followed how shuffle writer failure triggered a stage recompute, I may be wrong about that. Some inline comments. > task will ask shuffle manager whether throwing spark's fetchFailedException. At this time, shuffle manager will check whether the task failure reaches the spark max failure threshold. Once this condition is satisfied, this will make task throw and fail. I don’t think this is correct per se? ShuffleWriteFailures should not trigger a`FetchFailedException` from executor side which might mess around Spark's driver logic. The shuffle write stage doesn't involve a shuffle fetch if it's not read from another shuffle stage. I think a more appropriate approach would be that the shuffle writer reports its write failures and the shuffle manager decides whether to retry the whole stage or simply relaunch tasks in the same stage with different shuffler assignments. If it decides to retry the whole stage, it might need to do something hacky to fail the current shuffle write stage fast and unregister all its map outputs. This logic is similar like how Spark handles its barrier tasks. However, it might be hard to simulate that case from Uniffle's perspective. > Write and fetch failures won't happen at the same time, that's ensured by the spark, that's not pipelined data flow. it might be as long as there's one stage that reading from Uniffle's shuffle and writing shuffle data to uniffle too. There's only one instance of exception per task, though. So I think the shuffle manager should be aware of that when receiving shuffle fetch failures and write failures. > So the two phase deletion should be introduced to solve the problem of data visibility and delete speed, that splits the deletion into rename sync and delete async 2 phases. Hmm, we don't encode stage attempt number in the output path? If the stage attempt number is encoded, I think we can just leave the shuffle data as it is and put it into a asyn delete queue if necessary. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
