On Fri, Oct 27, 2017 at 2:17 PM, David Rosenstrauch <dar...@darose.net> wrote:
> I'm trying to make sure that as I'm deploying new services on our cluster,
> that failures/restarts get handled in a way that's most optimal for
> resiliency/uptime.
>
>
> I'm simplifying things a bit, but if a piece of code running inside a
> container crashes, there's more or less 2 possibilities:  1) bug in the code
> (and/or it's trying to process data that causes an error), or 2) problems
> with the hardware/network (full disk, bad disk, network outage, etc.)  If
> the issue is #1, then it doesn't matter whether you restart the container or
> the pod.  But if the issue is #2, then restarting the pod (i.e., on another
> host) would fix the problem, while restarting the container probably
> wouldn't.

We automatically detect things like full disks and network outages and
remove or repair those nodes.  Are you working around a known problem
or a hypothetical?

> So I guess this is sort of alluding to a bigger question, then:  does k8s
> have any ability to detect if a host is having hardware problems and, if so,
> avoid scheduling new pods on it, move pods off of it if their containers are
> crashing, etc.

yes.  Node Problem Detector detects a number of issues and responds.
GKE's NodeAutoRepair will automacially rebuild nodes when it detects
problems.

> I've done a lot of work with big data systems previously and, IIRC, Hadoop
> (for example) used to employ procedures to detect if a disk was bad, if many
> tasks on a particular node kept crashing, etc., and it would start to
> blacklist those.  My thinking was that k8s worked similarly - i.e., if all
> containers in a pod terminated unsuccessfully, then terminate the pod; if a
> particular node is having many pods terminated unsuccessfully, then stop
> launching new pods on there, etc.  Perhaps I'm misunderstanding / assuming
> incorrectly though.

We probably should have a crash-loop mode that kills the pod and lets
the scheduler re-assess.  AFAIK, we don't do that today, but it hasn't
been a huge problem.

> Thanks,
>
> DR
>
>
> On 2017-10-27 4:35 pm, 'Tim Hockin' via Kubernetes user discussion and Q&A
> wrote:
>>
>> What Rodrigo said - what problem are you trying to solve?
>>
>> The pod lifecycle is defined as restart-in-place, today.  Nothing you
>> can do inside your pod, except deleting it from the apiserver, will do
>> what you asking.  It doesn't seem too far fetched that a pod could
>> exit and "ask for a different node", but we're not going there without
>> a solid solid solid use case.
>>
>> On Fri, Oct 27, 2017 at 1:23 PM, Rodrigo Campos <rodrig...@gmail.com>
>> wrote:
>>>
>>> I don't think it is configurable.
>>>
>>> But I don't really see what you are trying to solve, maybe there is
>>> another
>>> way to achieve it? If you are running a pod of a single container, what
>>> is
>>> the problem that the container is restarted when is appropriate instead
>>> of
>>> the whole pod?
>>>
>>> I mean, you would need to handle the case where some container in the pod
>>> crashed or is stalled, right? The liveness probe will be done
>>> periodically,
>>> but until the next check is done, it can be hunged or something. So even
>>> if
>>> the whole pod is restarted, that problem is still there. And restarting
>>> the
>>> whole pod won't solve that. So probably my guess is not correct about
>>> what
>>> you are trying to solve.
>>>
>>> So, sorry, but can I ask again what is the problem you want to address?
>>> :)
>>>
>>>
>>> On Friday, October 27, 2017, David Rosenstrauch <dar...@darose.net>
>>> wrote:
>>>>
>>>>
>>>> Was speaking to our admin here, and he offered that running a health
>>>> check
>>>> container inside the same pod might work.  Anyone agree that that would
>>>> be a
>>>> good (or even preferred) approach?
>>>>
>>>> Thanks,
>>>>
>>>> DR
>>>>
>>>> On 2017-10-27 11:41 am, David Rosenstrauch wrote:
>>>>>
>>>>>
>>>>> I have a pod which runs a single container.  The pod is being run
>>>>> under a ReplicaSet (which starts a new pod to replace a pod that's
>>>>> terminated).
>>>>>
>>>>>
>>>>> What I'm seeing is that when the container within that pod terminates,
>>>>> instead of the pod terminating too, the pod stays alive, and just
>>>>> restarts the container in it.  However I'm thinking that what would
>>>>> make more sense would be for the entire pod to terminate in this
>>>>> situation, and then another would automatically start to replace it.
>>>>>
>>>>> Does this seem sensible?  If so, how would one accomplish this with
>>>>> k8s?  Changing the restart policy setting doesn't seem to be an
>>>>> option.  The restart policy (e.g. Restart=Always) seems to apply only
>>>>> to whether to restart a pod; the decision about whether to restart a
>>>>> container in a pod doesn't seem to be configurable.  (At least not
>>>>> that I could see.)
>>>>>
>>>>> Would appreciate any guidance anyone could offer here.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> DR
>>>>
>>>>
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups
>>>> "Kubernetes user discussion and Q&A" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an
>>>> email to kubernetes-users+unsubscr...@googlegroups.com.
>>>> To post to this group, send email to kubernetes-users@googlegroups.com.
>>>> Visit this group at https://groups.google.com/group/kubernetes-users.
>>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups
>>> "Kubernetes user discussion and Q&A" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an
>>> email to kubernetes-users+unsubscr...@googlegroups.com.
>>> To post to this group, send email to kubernetes-users@googlegroups.com.
>>> Visit this group at https://groups.google.com/group/kubernetes-users.
>>> For more options, visit https://groups.google.com/d/optout.
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Kubernetes user discussion and Q&A" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kubernetes-users+unsubscr...@googlegroups.com.
> To post to this group, send email to kubernetes-users@googlegroups.com.
> Visit this group at https://groups.google.com/group/kubernetes-users.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"Kubernetes user discussion and Q&A" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to kubernetes-users+unsubscr...@googlegroups.com.
To post to this group, send email to kubernetes-users@googlegroups.com.
Visit this group at https://groups.google.com/group/kubernetes-users.
For more options, visit https://groups.google.com/d/optout.

Reply via email to