[ 
https://issues.apache.org/jira/browse/AURORA-652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kone updated AURORA-652:
------------------------------

    Description: 
Currently there is no co-ordination between when the gc executor exits and when 
a gc task is launched. This results in a race where a slave might launch a task 
on that just exited (but the slave doesn't know about it yet) executor.

This problem is exacerbated during aurora failover because a slew of gc tasks 
are launched increasing the probability of the race across the cluster. 
While Aurora-608 will reduce the probability, coordinating the gc executor exit 
via aurora seems like the right solution.

  was:
Currently there is no co-ordination between when the gc executor exits and when 
a gc task is launched. This results in a race where a slave might launch a task 
on that just exited (but the slave doesn't know about it yet) executor.

This problem is exacerbated during aurora failover because a slew of gc tasks 
are launched increasing the probability of the race across the cluster. At the 
current time this is triggering our LOST tasks alerts whenever a scheduler 
failover happens. While Aurora-608 will reduce the probability, coordinating 
the gc executor exit via aurora seems like the right solution.


> GC Executor termination should be co-ordinated by the scheduler
> ---------------------------------------------------------------
>
>                 Key: AURORA-652
>                 URL: https://issues.apache.org/jira/browse/AURORA-652
>             Project: Aurora
>          Issue Type: Bug
>          Components: Reliability
>            Reporter: Vinod Kone
>
> Currently there is no co-ordination between when the gc executor exits and 
> when a gc task is launched. This results in a race where a slave might launch 
> a task on that just exited (but the slave doesn't know about it yet) executor.
> This problem is exacerbated during aurora failover because a slew of gc tasks 
> are launched increasing the probability of the race across the cluster. 
> While Aurora-608 will reduce the probability, coordinating the gc executor 
> exit via aurora seems like the right solution.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to