Re: Surfacing additional issues on agent host to schedulers

Zhitao Li Wed, 21 Feb 2018 11:32:28 -0800

Hi James,

The "condition" list you described fits our modeling pretty well, although
I don't know whether the eviction is made by a scheduler or the local
kubelet proxy.


Do you know whether the conditions can be extended and operator can define
additional conditions which is not in the provided list?

On Tue, Feb 20, 2018 at 3:54 PM, James Peach <[email protected]> wrote:

>
> > On Feb 20, 2018, at 11:11 AM, Zhitao Li <[email protected]> wrote:
> >
> > Hi,
> >
> > In one of recent Mesos meet up, quite a couple of cluster operators had
> > expressed complaints that it is hard to model host issues with Mesos at
> the
> > moment.
> >
> > For example, in our environment, the only signal scheduler would know is
> > whether Mesos agent has disconnected from the cluster. However, we have a
> > family of other issues in real production which makes the hosts
> (sometimes
> > "partially") unusable. Examples include:
> > - traffic routing software malfunction (i.e, haproxy): Mesos agent does
> not
> > require this so scheduler/deployment system is not aware, but actual
> > workload on the cluster will fail;
> > - broken disk;
> > - other long running system agent issues.
> >
> > This email is looking at how can Mesos recommend best practice to surface
> > these issues to scheduler, and whether we need additional primitives in
> > Mesos to achieve such goal.
>
> In the K8s world the node can publish "conditions" that describe its status
>
>         https://kubernetes.io/docs/concepts/architecture/nodes/#condition
>
> The condition can automatically taint the node, which could cause pods to
> automatically be evicted (ie. if they can't tolerate that specific taint).
>
> J




-- 
Cheers,

Zhitao Li

Re: Surfacing additional issues on agent host to schedulers

Reply via email to