[
https://issues.apache.org/jira/browse/MESOS-3479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14940931#comment-14940931
]
Matthias Veit edited comment on MESOS-3479 at 10/2/15 9:07 AM:
---------------------------------------------------------------
Hey [~haosdent] this part is crucial: if one health check fails, does not mean
the task needs to get killed (this by the way is a decision of the scheduler
not mesos).
Example:
HealthCheckCommand: sleep 30 with a timeout of 20 seconds and 3 consecutive
failures:
- After 0 seconds
-- sleep 30 is started with pid 1
- After 20 seconds
-- sleep 30 with pid 1 is killed
-- Task marked unhealthy
-- A new sleep 30 is started with pid 2
- After 40 seconds:
-- sleep 30 with pid 2 is killed
-- Task marked unhealthy
-- A new sleep 30 is started with pid 3
- After 60 seconds:
-- sleep 30 with pid 2 is killed
-- Task marked unhealthy
-- Scheduler will kill that task
was (Author: aquamatthias):
Hey [~haosdent] this part is crucial: if one health check fails, does not mean
the task needs to get killed (this by the way is a decision of the scheduler
not mesos).
Example:
HealthCheckCommand: sleep 30 with a timeout of 20 seconds and 3 consecutive
failures:
- After 0 seconds:
- sleep 30 is started with pid 1
- After 20 seconds:
- sleep 30 with pid 1 is killed
- Task marked unhealthy
- A new sleep 30 is started with pid 2
- After 40 seconds:
- sleep 30 with pid 2 is killed
- Task marked unhealthy
- A new sleep 30 is started with pid 3
- After 60 seconds:
- sleep 30 with pid 2 is killed
- Task marked unhealthy
- Scheduler will kill that task
> COMMAND Health Checks are not executed if the timeout is exceeded
> -----------------------------------------------------------------
>
> Key: MESOS-3479
> URL: https://issues.apache.org/jira/browse/MESOS-3479
> Project: Mesos
> Issue Type: Bug
> Affects Versions: 0.23.0
> Reporter: Matthias Veit
> Assignee: haosdent
> Priority: Critical
>
> The issue first appeared as Marathon Bug: See here for reference:
> https://github.com/mesosphere/marathon/issues/2179.
> A COMMAND health check is defined with a timeout of 20 seconds.
> The command itself takes longer than 20 seconds to execute.
> Current behavior:
> - The mesos health check process get's killed, but the defined command
> process not (in the example the curl command returns after 21 seconds).
> - The check attempt is considered healthy, if the timeout is exceeded
> - The health check stops and is not executed any longer
> Expected behavior:
> - The defined health check command is killed, when the timeout is exceeded
> - The check attempt is considered Unhealthy, if the timeout is exceeded
> - The health check does not stop
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)