onebox-li commented on PR #2921: URL: https://github.com/apache/celeborn/pull/2921#issuecomment-2595105756
Thanks @turboFei for this work. We encountered an occasional problem recently. When the speculation conditions were loose, a speculation task was waiting for `updateFileGroup` result. Unfortunately, another attempt succeeded and the task was Interrupted. Since the load file group failed, a fetch failure was reported to LifecycleManager. Other newly started tasks would get an error below when they tried `getShuffleId`. ``` unexpected! there is no finished map stage associated with appShuffleId xx ``` This may cause the whole job to fail. I think this PR is very help to solve this situation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
