[jira] [Created] (SAMZA-2692) ClusterBasedJobCoordinator does not shut down cleanly on SIGTERM

Cameron Lee (Jira) Thu, 16 Sep 2021 11:13:06 -0700

Cameron Lee created SAMZA-2692:
----------------------------------

             Summary: ClusterBasedJobCoordinator does not shut down cleanly on 
SIGTERM
                 Key: SAMZA-2692
                 URL: https://issues.apache.org/jira/browse/SAMZA-2692
             Project: Samza
          Issue Type: Bug
            Reporter: Cameron Lee



There is no shutdown hook that triggers ClusterBasedJobCoordinator to stop, so 
SIGTERM will not trigger a clean shutdown of ClusterBasedJobCoordinator.

For YARN, it tries to SIGTERM first, but then follows up with a SIGKILL after a 
timeout ("yarn.nodemanager.sleep-delay-before-sigkill.ms") if the process 
doesn't exit. Therefore, the job coordinator process will exit, but it is an 
unclean shutdown. This also causes the shut down to be slower than necessary, 
since the RM needs to wait for the timeout before sending SIGKILL, instead of 
the process just exiting normally after the SIGTERM.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (SAMZA-2692) ClusterBasedJobCoordinator does not shut down cleanly on SIGTERM

Reply via email to