abhishekshivanna opened a new pull request #1240: SAMZA-2423: Heartbeat failure causes incorrect container shutdown URL: https://github.com/apache/samza/pull/1240 **Symptom:** When a container heartbeat fails, the container shutdown sequence is triggered and the Container is never restarted. **Cause:** When a container heartbeat fails, the container shutdown sequence exists the Container with an exit code of `0` which marks the container as `Completed` - preventing the JobCoordinator from restarting the container. The bug is caused by `containerException` overwritten with the value returned by `listener.getContainerException` without checking if `containerException` was already set by the heartbeat monitor **Changes:** The container can shutdown exceptionally in the following two ways: 1) Exception in the container 2) Heartbeat Expired In both paths the ContainerLaunchUtil previously expected a shared static variable to hold the exception. The change introduced gets rid of the static variable and checks each path explicitly and exits with code `1` in both cases.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
