[ 
https://issues.apache.org/jira/browse/AURORA-652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe Smith updated AURORA-652:
-----------------------------

    Component/s: Reliability

> GC Executor termination should be co-ordinated by the scheduler
> ---------------------------------------------------------------
>
>                 Key: AURORA-652
>                 URL: https://issues.apache.org/jira/browse/AURORA-652
>             Project: Aurora
>          Issue Type: Bug
>          Components: Reliability
>            Reporter: Vinod Kone
>
> Currently there is no co-ordination between when the gc executor exits and 
> when a gc task is launched. This results in a race where a slave might launch 
> a task on that just exited (but the slave doesn't know about it yet) executor.
> This problem is exacerbated during aurora failover because a slew of gc tasks 
> are launched increasing the probability of the race across the cluster. At 
> the current time this is triggering our LOST tasks alerts whenever a 
> scheduler failover happens. While Aurora-608 will reduce the probability, 
> coordinating the gc executor exit via aurora seems like the right solution 
> (proposed in AURORA-973 but implemented AFAICT).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to