Hi Everyone,

In order to fix the issue of orphaned/leaky containers seen when the
YARN Node Manager crashes, I have created a SEP discussing the design for
implementing a heartbeat between the containers and the job coordinator:
https://cwiki.apache.org/confluence/display/SAMZA/SEP-3%3A+Heart-beat+mechanism+between+JobCoordinator+and+all+running+containers

Please take a look and provide feedback. I would also really appreciate
help in designing a way to propagate the error up from SamzaContainer in
order to exit the container with a non-zero exit code.

Thanks,
Abhishek

Reply via email to