waitinfuture commented on PR #2233:
URL: 
https://github.com/apache/incubator-celeborn/pull/2233#issuecomment-1902466185

   > > ... if only one disk is used and the disk is bad, the worker still 
serves traffic, but all the requests will fail.
   > 
   > I suppose the design of Celeborn can handle bad / run-out-of-space issues 
at runtime dynamically. Permanently excluding a bad disk at startup is too 
rough.
   
   Currently Celeborn will not retry creating base working directories if it 
fails to create when startup. It only manages inner directories inside the base 
working directories. So I think this PR is necessary. Maybe we can add 
check-and-retry for base working dirs in future.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to