I'm trying to make sure that as I'm deploying new services on our cluster, that failures/restarts get handled in a way that's most optimal for resiliency/uptime.

I'm simplifying things a bit, but if a piece of code running inside a container crashes, there's more or less 2 possibilities: 1) bug in the code (and/or it's trying to process data that causes an error), or 2) problems with the hardware/network (full disk, bad disk, network outage, etc.) If the issue is #1, then it doesn't matter whether you restart the container or the pod. But if the issue is #2, then restarting the pod (i.e., on another host) would fix the problem, while restarting the container probably wouldn't.

So I guess this is sort of alluding to a bigger question, then: does k8s have any ability to detect if a host is having hardware problems and, if so, avoid scheduling new pods on it, move pods off of it if their containers are crashing, etc.

I've done a lot of work with big data systems previously and, IIRC, Hadoop (for example) used to employ procedures to detect if a disk was bad, if many tasks on a particular node kept crashing, etc., and it would start to blacklist those. My thinking was that k8s worked similarly - i.e., if all containers in a pod terminated unsuccessfully, then terminate the pod; if a particular node is having many pods terminated unsuccessfully, then stop launching new pods on there, etc. Perhaps I'm misunderstanding / assuming incorrectly though.

Thanks,

DR

On 2017-10-27 4:35 pm, 'Tim Hockin' via Kubernetes user discussion and Q&A wrote:
What Rodrigo said - what problem are you trying to solve?

The pod lifecycle is defined as restart-in-place, today.  Nothing you
can do inside your pod, except deleting it from the apiserver, will do
what you asking.  It doesn't seem too far fetched that a pod could
exit and "ask for a different node", but we're not going there without
a solid solid solid use case.

On Fri, Oct 27, 2017 at 1:23 PM, Rodrigo Campos <rodrig...@gmail.com> wrote:
I don't think it is configurable.

But I don't really see what you are trying to solve, maybe there is another way to achieve it? If you are running a pod of a single container, what is the problem that the container is restarted when is appropriate instead of
the whole pod?

I mean, you would need to handle the case where some container in the pod crashed or is stalled, right? The liveness probe will be done periodically, but until the next check is done, it can be hunged or something. So even if the whole pod is restarted, that problem is still there. And restarting the whole pod won't solve that. So probably my guess is not correct about what
you are trying to solve.

So, sorry, but can I ask again what is the problem you want to address? :)


On Friday, October 27, 2017, David Rosenstrauch <dar...@darose.net> wrote:

Was speaking to our admin here, and he offered that running a health check container inside the same pod might work. Anyone agree that that would be a
good (or even preferred) approach?

Thanks,

DR

On 2017-10-27 11:41 am, David Rosenstrauch wrote:

I have a pod which runs a single container.  The pod is being run
under a ReplicaSet (which starts a new pod to replace a pod that's
terminated).


What I'm seeing is that when the container within that pod terminates,
instead of the pod terminating too, the pod stays alive, and just
restarts the container in it.  However I'm thinking that what would
make more sense would be for the entire pod to terminate in this
situation, and then another would automatically start to replace it.

Does this seem sensible?  If so, how would one accomplish this with
k8s?  Changing the restart policy setting doesn't seem to be an
option. The restart policy (e.g. Restart=Always) seems to apply only
to whether to restart a pod; the decision about whether to restart a
container in a pod doesn't seem to be configurable.  (At least not
that I could see.)

Would appreciate any guidance anyone could offer here.

Thanks,

DR


--
You received this message because you are subscribed to the Google Groups
"Kubernetes user discussion and Q&A" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to kubernetes-users+unsubscr...@googlegroups.com.
To post to this group, send email to kubernetes-users@googlegroups.com.
Visit this group at https://groups.google.com/group/kubernetes-users.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"Kubernetes user discussion and Q&A" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to kubernetes-users+unsubscr...@googlegroups.com.
To post to this group, send email to kubernetes-users@googlegroups.com.
Visit this group at https://groups.google.com/group/kubernetes-users.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Kubernetes 
user discussion and Q&A" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to kubernetes-users+unsubscr...@googlegroups.com.
To post to this group, send email to kubernetes-users@googlegroups.com.
Visit this group at https://groups.google.com/group/kubernetes-users.
For more options, visit https://groups.google.com/d/optout.

Reply via email to