As we're getting ready to go to production with our k8s-based system, we're trying to pin down exactly how we're going to do all the needed monitoring/alerting for it. We can easily collect many of the metrics we need (using kube-state-metrics to feed into prometheus, and/or Datadog) and alert off of those.

However, there's other important k8s-related info about our system that we need to be able to access, monitor, and alert on, most notably things like:

* If a container crashes and is restarted by k8s

* If k8s kills a container and restarts it (e.g., due to exceeding cpu or memory limits, or due to repeated failure of liveness check)

* If k8s kills a container but cannot restart it

* If an entire pod crashes and is restarted by k8s

etc.


How would would go about gaining access to those k8s-related events in an automated fashion, and setting up monitoring/alerting off of those?

Thanks,

DR

--
You received this message because you are subscribed to the Google Groups "Kubernetes 
user discussion and Q&A" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to kubernetes-users+unsubscr...@googlegroups.com.
To post to this group, send email to kubernetes-users@googlegroups.com.
Visit this group at https://groups.google.com/group/kubernetes-users.
For more options, visit https://groups.google.com/d/optout.

Reply via email to