Cameron Lee created SAMZA-2692:
----------------------------------
Summary: ClusterBasedJobCoordinator does not shut down cleanly on
SIGTERM
Key: SAMZA-2692
URL: https://issues.apache.org/jira/browse/SAMZA-2692
Project: Samza
Issue Type: Bug
Reporter: Cameron Lee
There is no shutdown hook that triggers ClusterBasedJobCoordinator to stop, so
SIGTERM will not trigger a clean shutdown of ClusterBasedJobCoordinator.
For YARN, it tries to SIGTERM first, but then follows up with a SIGKILL after a
timeout ("yarn.nodemanager.sleep-delay-before-sigkill.ms") if the process
doesn't exit. Therefore, the job coordinator process will exit, but it is an
unclean shutdown. This also causes the shut down to be slower than necessary,
since the RM needs to wait for the timeout before sending SIGKILL, instead of
the process just exiting normally after the SIGTERM.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)