As we're getting ready to go to production with our k8s-based system,
we're trying to pin down exactly how we're going to do all the needed
monitoring/alerting for it. We can easily collect many of the metrics
we need (using kube-state-metrics to feed into prometheus, and/or
Datadog) and alert off of those.
However, there's other important k8s-related info about our system that
we need to be able to access, monitor, and alert on, most notably things
like:
* If a container crashes and is restarted by k8s
* If k8s kills a container and restarts it (e.g., due to exceeding cpu
or memory limits, or due to repeated failure of liveness check)
* If k8s kills a container but cannot restart it
* If an entire pod crashes and is restarted by k8s
etc.
How would would go about gaining access to those k8s-related events in
an automated fashion, and setting up monitoring/alerting off of those?
Thanks,
DR
--
You received this message because you are subscribed to the Google Groups "Kubernetes
user discussion and Q&A" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to kubernetes-users+unsubscr...@googlegroups.com.
To post to this group, send email to kubernetes-users@googlegroups.com.
Visit this group at https://groups.google.com/group/kubernetes-users.
For more options, visit https://groups.google.com/d/optout.