Prateek Maheshwari <prateek...@gmail.com> writes:

Hi Tom,

This would depend on what your k8s container orchestration logic looks
like. For example, in YARN, 'status' returns 'not running' after 'start'
until all the containers requested from the AM are 'running'. We also
leverage YARN to restart containers/job automatically on failures (within
some bounds). Additionally, we set up a monitoring alert that goes off if
the number of running containers stays lower than the number of expected
containers for extended periods of time (~ 5 minutes).

Are you saying that you noticed that the LocalApplicationRunner status
returns 'running' even if its stream processor / SamzaContainer has stopped
processing?


Yeah, this is what I mean. We have a health check for the overall
ApplicationStatus but if the containers enter a failed state that
doesn't result in a shut down of the runner itself. An example from last
night: Kafka became unavailable at some point and Samza failed to write
checkpoints for a while, ultimately leading to container failures. The
last log line is:

o.a.s.c.SamzaContainer - Shutdown is no-op since the container is already in state: FAILED

This doesn't cause the Pod to be killed, though, so we just silently
stop processing events. How do you determine the number of expected
containers? Or are you speaking of containers in terms of YARN and not
Samza processors?


- Prateek

On Fri, Mar 15, 2019 at 7:26 AM Tom Davis <t...@recursivedream.com> wrote:

I'm using the LocalApplicationRunner and had added a liveness check
around the `status` method. The app is running in Kubernetes so, in
theory, it could be restarted if exceptions happened during processing.
However, it seems that "container failure" is divorced from "app
failure" because the app continues to run even after all the task
containers have shut down. Is there a better way to check for
application health? Is there a way to shut down the application if all
containers have failed? Should I simply ensure exceptions never escape
operators? Thanks!

Reply via email to