FMX commented on PR #3294: URL: https://github.com/apache/celeborn/pull/3294#issuecomment-2958080950
> @FMX By users i meant the Celeborn admins, not the client. > > For worker decommission, the force exit timeout is 600s. But in some cases the shutdown hook will not fully execute, lets say `sendWorkerDecommissionToMaster()` took 2s. Due to which we will miss out on the information on unreleased shuffle which gets print at the end of decommission shutdown hook. > > Similarly, for worker graceful shutdown, current default timeout is 600s, which accounts for check slots finished timeout (480s) and partition sorter timeout (120s). So either we can increase the graceful shutdown default value slightly or ask the celeborn admins to increase the timeout slightly grater than (check slots finished timeout + partition sorter timeout). > > Both of the cases can be handled by adding small cooldown time. In this scenario, why not increase the graceful shutdown time slightly? Looks like in both approaches, the cluster admin will need to change the config file. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
