FMX commented on PR #2625:
URL: https://github.com/apache/celeborn/pull/2625#issuecomment-2227758853

   > > This will happen because shuffle client implementation is singleton.
   > 
   > The singleton pattern does not cause this problem. The core issue is that 
the task that fetches Filegroups was killed, and it placed invalid Filegroups 
in the reduceFileGroupsMap, which led to other tasks obtaining invalid 
Filegroups.
   
   Your spark stage retried because of this config `spark.task.maxFailures`. 
The default value is 4, meaning the stage will retry if 4 tasks fail.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to