FMX commented on PR #3294:
URL: https://github.com/apache/celeborn/pull/3294#issuecomment-2958080950

   > @FMX By users i meant the Celeborn admins, not the client.
   > 
   > For worker decommission, the force exit timeout is 600s. But in some cases 
the shutdown hook will not fully execute, lets say 
`sendWorkerDecommissionToMaster()` took 2s. Due to which we will miss out on 
the information on unreleased shuffle which gets print at the end of 
decommission shutdown hook.
   > 
   > Similarly, for worker graceful shutdown, current default timeout is 600s, 
which accounts for check slots finished timeout (480s) and partition sorter 
timeout (120s). So either we can increase the graceful shutdown default value 
slightly or ask the celeborn admins to increase the timeout slightly grater 
than (check slots finished timeout + partition sorter timeout).
   > 
   > Both of the cases can be handled by adding small cooldown time.
   
   In this scenario, why not increase the graceful shutdown time slightly? 
Looks like in both approaches, the cluster admin will need to change the config 
file.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to