leixm opened a new pull request, #3320:
URL: https://github.com/apache/celeborn/pull/3320

   ### What changes were proposed in this pull request?
   In the dual-replica scenario, when creating a reader, we should select the 
replica based on taskAttemptId. Usually, taskAttempt0 selects primary 
partitionLocation, task Attempt1 selects replica partitionLocation, and so on. 
This will provide better fault tolerance.
   
   
   ### Why are the changes needed?
   If the data of primary partitionLocation is corrupted and 
CelebornInputStream#fillBuffer throws exception, such as decompression failure, 
the replica prititionLocation will not be used when the task is retried. In 
fact, if taskAttempt1 uses the replica partitionLocation, taskAttempt1 can run 
successfully.
   
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   
   ### How was this patch tested?
   Existing UTs.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to