RexXiong commented on PR #3125: URL: https://github.com/apache/celeborn/pull/3125#issuecomment-2693369740
> will the spark app fallback to reading replicate shuffle data We cannot fallback to the replica because some sub reducer tasks may have already successfully read data from the primary copy. If a task that encounters an error fallback to the replica, it may read duplicate data, which is caused by the different order of data between the primary and replica. In this scenario, trigger stage rerun would be better. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
