yl09099 opened a new pull request, #1129:
URL: https://github.com/apache/incubator-uniffle/pull/1129

   ### What changes were proposed in this pull request?
   
   1. During the shuffle write phase, the ShuffleServer reports faulty nodes 
and reallocates the ShuffleServer list;
   2. Triggers a Stage level retry of SPARK. The shuffleServer node is excluded 
and reallocated before the retry.
   
   ### Why are the changes needed?
   
   Add fault tolerance to the shuffle write phase
   
   Fix: #825 
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Existing UT as well as adding an integration 
test:RSSStageDynamicServerReWriteTest
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to