s0nskar commented on PR #3294:
URL: https://github.com/apache/celeborn/pull/3294#issuecomment-2954850423

   @FMX  By users i meant the Celeborn admins, not the client.
   
   For worker graceful shutdown, current default timeout is 600s, which 
accounts for check slots finished timeout (480s) and partition sorter timeout 
(120s). So either we can increase the graceful shutdown default value slightly 
or ask the celeborn admins to increase the timeout slightly grater than (check 
slots finished timeout + partition sorter timeout).
   
   For worker decommission, the force exit timeout is 600s. But in some cases 
the shutdown hook will not fully execute, lets say 
`sendWorkerDecommissionToMaster()` took 2s. Due to which we will miss out on 
the information on unreleased shuffle which gets print at the end of 
decommission shutdown hook.
   
   Both of the cases can be handled by adding small cooldown time.
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to