I understand and am in agreement that `HealthCheckStatusInfo` will have
more information than `CheckStatusInfo`.

I would like us to put a little more thought into how that would look like
to be doubly sure that what we are introducing today will be evolvable into
that envisioned future. We have to live with API changes for a long time,
so I would like to see more rigor here (e.g., has the note on top of the
`HealthCheckStatusInfo` in the doc
<https://docs.google.com/document/d/1VLdaH7i7UDT3_38aOlzTOtH7lwH-laB8dCwNzte0DkU/edit#heading=h.lessdcojxc5v>
has
been discussed/resolved?) to avoid costly changes/deprecations.

On Thu, Oct 18, 2018 at 4:04 AM Alex Rukletsov <a...@mesosphere.com> wrote:

> Thanks for the thoughts, Vinod! Answers inlined.
>
> On Wed, Oct 17, 2018 at 8:55 PM Vinod Kone <vinodk...@apache.org> wrote:
>
> > One of the things we discussed when we added `CheckInfo` and
> > `CheckStatusInfo` was to make the older `HealthCheck` and `bool healthy`
> > field (inside `TaskStatus`) consistent with the new `Check` format.
> >
> Correct.
>
> >
> > IIRC, some of the changes we wanted to do were
> >
> >    - Deprecate `HealthCheck` and introduce a new `HealthCheckInfo` proto
> >
> Correct.
>
> >    - The nested messages inside `HealthCheck` (e.g., `HTTPCheckInfo`)
>
>    should be named differently in `HealthCheckInfo` (e.g., `Http`)
> >
> Likely, yes.
>
> >    - Deprecate `bool healthy` in TaskStatusInfo and introduce a new
> >    `HealthCheckStatusInfo` which looks similar to `CheckStatusInfo`
> >
> Correct.
>
> >
> > Right now, the proposal seems to only address the last point without
> > addressing the first two, which feels weird to me. I would prefer to see
> > them addressed in one shot.
> >
> Can you please explain why? Is there any problem you foresee if we do it
> step by step? Introducing `HealthCheckStatusInfo` now solves an important
> problem and does not seem to introduce new issues.
>
> >
> > Additionally, the proposed `HealthCheckStatusInfo` proto looks completely
> > different from `CheckStatusInfo`. Is that intentional? I hope we are not
> > thinking of deprecating it again when we come around to fix `HealthCheck`
> > proto to be consistent with `CheckInfo` ?
> >
> How do you think it should look like? Why will we deprecate it?
>
> Health checks are different from checks in the way the result of a check is
> interpreted on the agent. In other words health check is an extra step on
> top of a check. We might include `CheckStatusInfo` or its contents into
> `HealthCheckStatusInfo`, but... should we think about this now? It is nice
> to have lower level info from the check in the heath status update, but it
> also means more data to transfer. But interpretation—health—we definitely
> need.
>
> Greg, I'm +1 on your proposal.
>
> >
> > Thanks,
> >
> > On Wed, Oct 17, 2018 at 1:26 PM Greg Mann <g...@mesosphere.io> wrote:
> >
> > > Hi all,
> > > Some users have recently reported issues with our current
> implementation
> > > of health checks. See this ticket
> > > <https://issues.apache.org/jira/browse/MESOS-6417> for an introduction
> > to
> > > the issue.
> > >
> > > To summarize: we currently use a single 'optional bool healthy' field
> > > within the 'TaskStatus' message to indicate the result of a health
> check.
> > > This allows us to expose 3 health states to users:
> > > 1) 'healthy' field is unset = no health check specified, or health
> check
> > > failed but grace period has not yet elapsed, or health check has not
> yet
> > > been attempted
> > > 2) 'healthy' field is set to 'false' = a health check is specified and
> it
> > > returned 'false'
> > > 3) 'healthy' field is set to 'true' = a health check is specified and
> it
> > > returned 'true'
> > >
> > > The issue is that some users need to distinguish between the three
> > > scenarios in #1: no health check is specified, OR the task is not yet
> > > healthy but we are in the grace period. An example use case would be a
> > load
> > > balancer which needs to wait for a healthy status to route traffic, but
> > > which immediately routes traffic to tasks which have no health check
> > > defined.
> > >
> > > This issue was recognized during the design of Mesos generalized
> checks;
> > > for those checks, we use the presence of the 'check_status' field to
> > > indicate whether or not a check is defined for the task. While
> consumers
> > > could make use of generalized checks as a workaround, this does not
> allow
> > > them to both detect the presence of a check AND achieve the
> task-killing
> > > behavior that health checks provide.
> > >
> > > In order to address this, I would like to propose the following new
> > > message, and an addition to the 'TaskStatus' message:
> > >
> > > message HealthCheckStatusInfo {
> > >   enum Status {
> > >     UNKNOWN = 0;
> > >     HEALTHY = 1;
> > >     UNHEALTHY = 2;
> > >   }
> > >
> > >   required Status status = 0;
> > > }
> > >
> > > message TaskStatus {
> > >   . . .
> > >
> > >   optional HealthCheckStatusInfo health_check_status = 17;
> > >
> > >   . . .
> > > }
> > >
> > > The semantics of these fields would be as follows:
> > >
> > > 'health_status' field:
> > > - If set, a health check has been set
> > > - If unset, a health check has not been set
> > >
> > > 'health_status.status' field:
> > > - UNKNOWN: The task has not become healthy but is still within its
> grace
> > > period (this state is also used if an internal error prevents us from
> > > running the health check successfully)
> > > - HEALTHY: The health check indicates the task is healthy
> > > - UNHEALTHY: The health check indicates the task is not healthy
> > >
> > > This change would also involve deprecating the existing 'healthy'
> field.
> > > In accordance with our deprecation policy, I believe we could not
> remove
> > > the deprecated field until we have a new major release (2.x).
> > >
> > > I'd love to hear feedback on this proposal, thanks in advance! I'll
> also
> > > add this as an agenda item to our upcoming API working group meeting on
> > > Tuesday, Oct. 16 at 11am PST.
> > >
> > > Cheers,
> > > Greg
> > >
> >
>

Reply via email to