[ 
https://issues.apache.org/jira/browse/MESOS-3479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14940931#comment-14940931
 ] 

Matthias Veit edited comment on MESOS-3479 at 10/2/15 9:07 AM:
---------------------------------------------------------------

Hey [~haosdent] this part is crucial: if one health check fails, does not mean 
the task needs to get killed (this by the way is a decision of the scheduler 
not mesos). 

Example:
HealthCheckCommand: sleep 30 with a timeout of 20 seconds and 3 consecutive 
failures:

- After 0 seconds
     -- sleep 30 is started with pid 1
- After 20 seconds
    -- sleep 30 with pid 1 is killed 
    -- Task marked unhealthy 
    -- A new sleep 30 is started with pid 2
- After 40 seconds: 
    -- sleep 30 with pid 2 is killed 
    -- Task marked unhealthy 
    -- A new sleep 30 is started with pid 3
- After 60 seconds: 
    -- sleep 30 with pid 2 is killed 
    -- Task marked unhealthy 
    -- Scheduler will kill that task



was (Author: aquamatthias):
Hey [~haosdent] this part is crucial: if one health check fails, does not mean 
the task needs to get killed (this by the way is a decision of the scheduler 
not mesos). 

Example:
HealthCheckCommand: sleep 30 with a timeout of 20 seconds and 3 consecutive 
failures:

- After 0 seconds: 
   - sleep 30 is started with pid 1
- After 20 seconds: 
  - sleep 30 with pid 1 is killed 
  - Task marked unhealthy 
  - A new sleep 30 is started with pid 2
- After 40 seconds: 
  - sleep 30 with pid 2 is killed 
  - Task marked unhealthy 
  - A new sleep 30 is started with pid 3
- After 60 seconds: 
  - sleep 30 with pid 2 is killed 
  - Task marked unhealthy 
  - Scheduler will kill that task


> COMMAND Health Checks are not executed if the timeout is exceeded
> -----------------------------------------------------------------
>
>                 Key: MESOS-3479
>                 URL: https://issues.apache.org/jira/browse/MESOS-3479
>             Project: Mesos
>          Issue Type: Bug
>    Affects Versions: 0.23.0
>            Reporter: Matthias Veit
>            Assignee: haosdent
>            Priority: Critical
>
> The issue first appeared as Marathon Bug: See here for reference: 
> https://github.com/mesosphere/marathon/issues/2179.
> A COMMAND health check is defined with a timeout of 20 seconds.
> The command itself takes longer than 20 seconds to execute.
> Current behavior: 
> - The mesos health check process get's killed, but the defined command 
> process not (in the example the curl command returns after 21 seconds).
> - The check attempt is considered healthy, if the timeout is exceeded
> - The health check stops and is not executed any longer
> Expected behavior: 
> - The defined health check command is killed, when the timeout is exceeded
> - The check attempt is considered Unhealthy, if the timeout is exceeded
> - The health check does not stop 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to