leixm opened a new pull request, #3320: URL: https://github.com/apache/celeborn/pull/3320
### What changes were proposed in this pull request? In the dual-replica scenario, when creating a reader, we should select the replica based on taskAttemptId. Usually, taskAttempt0 selects primary partitionLocation, task Attempt1 selects replica partitionLocation, and so on. This will provide better fault tolerance. ### Why are the changes needed? If the data of primary partitionLocation is corrupted and CelebornInputStream#fillBuffer throws exception, such as decompression failure, the replica prititionLocation will not be used when the task is retried. In fact, if taskAttempt1 uses the replica partitionLocation, taskAttempt1 can run successfully. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing UTs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
