[
https://issues.apache.org/jira/browse/MESOS-3479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14940208#comment-14940208
]
Gabriel Hartmann edited comment on MESOS-3479 at 10/1/15 7:33 PM:
------------------------------------------------------------------
I'm not sure I understand all of the above concerning particular behavior
scenarios. I believe the principal should be that health check attempts never
stop while a Scheduler is running. If a health check fails the required number
of consecutive times outside the grace period it is ok to stop making health
checks, but only under the assumption that the Scheduler is going to be killed.
What is meant by, "The value of the command is not correct"? Shouldn't this
just be counted as a health check failure so the Framework is counted as
unhealthy? It's fine that the health check fails potentially forever if the
command is malformed.
was (Author: [email protected]):
I'm not sure I understand all of the above concerning particular behavior
scenarios. I believe the principal should be that health check attempts never
stop while a Scheduler is running. If a health check fails the required number
of consecutive times outside the grace period it is ok to stop making health
checks, but only under the assumption that the Scheduler is going to be killed.
What is meant by, "The value of the command is not correct"? Shouldn't this
just be counted as a health check failure so the Framework is counted as
unhealthy, potentially forever if the command is malformed?
> COMMAND Health Checks are not executed if the timeout is exceeded
> -----------------------------------------------------------------
>
> Key: MESOS-3479
> URL: https://issues.apache.org/jira/browse/MESOS-3479
> Project: Mesos
> Issue Type: Bug
> Affects Versions: 0.23.0
> Reporter: Matthias Veit
> Assignee: haosdent
> Priority: Critical
>
> The issue first appeared as Marathon Bug: See here for reference:
> https://github.com/mesosphere/marathon/issues/2179.
> A COMMAND health check is defined with a timeout of 20 seconds.
> The command itself takes longer than 20 seconds to execute.
> Current behavior:
> - The mesos health check process get's killed, but the defined command
> process not (in the example the curl command returns after 21 seconds).
> - The check attempt is considered healthy, if the timeout is exceeded
> - The health check stops and is not executed any longer
> Expected behavior:
> - The defined health check command is killed, when the timeout is exceeded
> - The check attempt is considered Unhealthy, if the timeout is exceeded
> - The health check does not stop
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)