yl09099 opened a new pull request, #1129: URL: https://github.com/apache/incubator-uniffle/pull/1129
### What changes were proposed in this pull request? 1. During the shuffle write phase, the ShuffleServer reports faulty nodes and reallocates the ShuffleServer list; 2. Triggers a Stage level retry of SPARK. The shuffleServer node is excluded and reallocated before the retry. ### Why are the changes needed? Add fault tolerance to the shuffle write phase Fix: #825 ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing UT as well as adding an integration test:RSSStageDynamicServerReWriteTest -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
