I'm trying to make sure that as I'm deploying new services on our
cluster, that failures/restarts get handled in a way that's most optimal
for resiliency/uptime.
I'm simplifying things a bit, but if a piece of code running inside a
container crashes, there's more or less 2 possibilities: 1) bug in the
code (and/or it's trying to process data that causes an error), or 2)
problems with the hardware/network (full disk, bad disk, network outage,
etc.) If the issue is #1, then it doesn't matter whether you restart
the container or the pod. But if the issue is #2, then restarting the
pod (i.e., on another host) would fix the problem, while restarting the
container probably wouldn't.
So I guess this is sort of alluding to a bigger question, then: does
k8s have any ability to detect if a host is having hardware problems
and, if so, avoid scheduling new pods on it, move pods off of it if
their containers are crashing, etc.
I've done a lot of work with big data systems previously and, IIRC,
Hadoop (for example) used to employ procedures to detect if a disk was
bad, if many tasks on a particular node kept crashing, etc., and it
would start to blacklist those. My thinking was that k8s worked
similarly - i.e., if all containers in a pod terminated unsuccessfully,
then terminate the pod; if a particular node is having many pods
terminated unsuccessfully, then stop launching new pods on there, etc.
Perhaps I'm misunderstanding / assuming incorrectly though.
Thanks,
DR
On 2017-10-27 4:35 pm, 'Tim Hockin' via Kubernetes user discussion and
Q&A wrote:
What Rodrigo said - what problem are you trying to solve?
The pod lifecycle is defined as restart-in-place, today. Nothing you
can do inside your pod, except deleting it from the apiserver, will do
what you asking. It doesn't seem too far fetched that a pod could
exit and "ask for a different node", but we're not going there without
a solid solid solid use case.
On Fri, Oct 27, 2017 at 1:23 PM, Rodrigo Campos <rodrig...@gmail.com>
wrote:
I don't think it is configurable.
But I don't really see what you are trying to solve, maybe there is
another
way to achieve it? If you are running a pod of a single container,
what is
the problem that the container is restarted when is appropriate
instead of
the whole pod?
I mean, you would need to handle the case where some container in the
pod
crashed or is stalled, right? The liveness probe will be done
periodically,
but until the next check is done, it can be hunged or something. So
even if
the whole pod is restarted, that problem is still there. And
restarting the
whole pod won't solve that. So probably my guess is not correct about
what
you are trying to solve.
So, sorry, but can I ask again what is the problem you want to
address? :)
On Friday, October 27, 2017, David Rosenstrauch <dar...@darose.net>
wrote:
Was speaking to our admin here, and he offered that running a health
check
container inside the same pod might work. Anyone agree that that
would be a
good (or even preferred) approach?
Thanks,
DR
On 2017-10-27 11:41 am, David Rosenstrauch wrote:
I have a pod which runs a single container. The pod is being run
under a ReplicaSet (which starts a new pod to replace a pod that's
terminated).
What I'm seeing is that when the container within that pod
terminates,
instead of the pod terminating too, the pod stays alive, and just
restarts the container in it. However I'm thinking that what would
make more sense would be for the entire pod to terminate in this
situation, and then another would automatically start to replace it.
Does this seem sensible? If so, how would one accomplish this with
k8s? Changing the restart policy setting doesn't seem to be an
option. The restart policy (e.g. Restart=Always) seems to apply
only
to whether to restart a pod; the decision about whether to restart a
container in a pod doesn't seem to be configurable. (At least not
that I could see.)
Would appreciate any guidance anyone could offer here.
Thanks,
DR
--
You received this message because you are subscribed to the Google
Groups
"Kubernetes user discussion and Q&A" group.
To unsubscribe from this group and stop receiving emails from it,
send an
email to kubernetes-users+unsubscr...@googlegroups.com.
To post to this group, send email to
kubernetes-users@googlegroups.com.
Visit this group at https://groups.google.com/group/kubernetes-users.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google
Groups
"Kubernetes user discussion and Q&A" group.
To unsubscribe from this group and stop receiving emails from it, send
an
email to kubernetes-users+unsubscr...@googlegroups.com.
To post to this group, send email to
kubernetes-users@googlegroups.com.
Visit this group at https://groups.google.com/group/kubernetes-users.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "Kubernetes
user discussion and Q&A" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to kubernetes-users+unsubscr...@googlegroups.com.
To post to this group, send email to kubernetes-users@googlegroups.com.
Visit this group at https://groups.google.com/group/kubernetes-users.
For more options, visit https://groups.google.com/d/optout.