[ 
https://issues.apache.org/jira/browse/AURORA-279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977085#comment-14977085
 ] 

Bill Farner commented on AURORA-279:
------------------------------------

A concrete scenario is when instances of a service have a synchronized GC (on 
the JVM) that causes the executor to think the local instance is unhealthy.  In 
that scenario, killing all instances simultaneously is definitely worse than 
leaving it alone.  Of course, there's a decent amount of engineering necessary 
to solve a relatively rare problem.

> Allow scheduler to decide how to respond to task health check failures
> ----------------------------------------------------------------------
>
>                 Key: AURORA-279
>                 URL: https://issues.apache.org/jira/browse/AURORA-279
>             Project: Aurora
>          Issue Type: Story
>          Components: Executor, Scheduler
>            Reporter: Bill Farner
>            Priority: Minor
>
> The executor is currently autonomous in deciding to kill tasks that have 
> failed health checks.  If health check failures synchronize across a service, 
> the service could suffer an outage.  SLA considerations may also need to be 
> me made before deciding to kill a task for health check failures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to