waitinfuture commented on PR #2233: URL: https://github.com/apache/incubator-celeborn/pull/2233#issuecomment-1902466185
> > ... if only one disk is used and the disk is bad, the worker still serves traffic, but all the requests will fail. > > I suppose the design of Celeborn can handle bad / run-out-of-space issues at runtime dynamically. Permanently excluding a bad disk at startup is too rough. Currently Celeborn will not retry creating base working directories if it fails to create when startup. It only manages inner directories inside the base working directories. So I think this PR is necessary. Maybe we can add check-and-retry for base working dirs in future. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
