abhishekshivanna opened a new pull request #1240: SAMZA-2423: Heartbeat failure 
causes incorrect container shutdown
URL: https://github.com/apache/samza/pull/1240
 
 
   **Symptom:** 
   When a container heartbeat fails, the container shutdown
   sequence is triggered and the Container is never restarted.
   
   **Cause:**
   When a container heartbeat fails, the container shutdown
   sequence exists the Container with an exit code of `0` which
   marks the container as `Completed` - preventing the JobCoordinator
   from restarting the container.
   The bug is caused by `containerException` overwritten with the value
   returned by `listener.getContainerException` without checking if 
   `containerException` was already set by the heartbeat monitor
   
   **Changes:** 
   The container can shutdown exceptionally in the following two ways:
   1) Exception in the container
   2) Heartbeat Expired
   In both paths the ContainerLaunchUtil previously expected a
   shared static variable to hold the exception. The change introduced
   gets rid of the static variable and checks each path explicitly
   and exits with code `1` in both cases.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to