[
https://issues.apache.org/jira/browse/AURORA-652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vinod Kone updated AURORA-652:
------------------------------
Description:
Currently there is no co-ordination between when the gc executor exits and when
a gc task is launched. This results in a race where a slave might launch a task
on that just exited (but the slave doesn't know about it yet) executor.
This problem is exacerbated during aurora failover because a slew of gc tasks
are launched increasing the probability of the race across the cluster.
While Aurora-608 will reduce the probability, coordinating the gc executor exit
via aurora seems like the right solution.
was:
Currently there is no co-ordination between when the gc executor exits and when
a gc task is launched. This results in a race where a slave might launch a task
on that just exited (but the slave doesn't know about it yet) executor.
This problem is exacerbated during aurora failover because a slew of gc tasks
are launched increasing the probability of the race across the cluster. At the
current time this is triggering our LOST tasks alerts whenever a scheduler
failover happens. While Aurora-608 will reduce the probability, coordinating
the gc executor exit via aurora seems like the right solution.
> GC Executor termination should be co-ordinated by the scheduler
> ---------------------------------------------------------------
>
> Key: AURORA-652
> URL: https://issues.apache.org/jira/browse/AURORA-652
> Project: Aurora
> Issue Type: Bug
> Components: Reliability
> Reporter: Vinod Kone
>
> Currently there is no co-ordination between when the gc executor exits and
> when a gc task is launched. This results in a race where a slave might launch
> a task on that just exited (but the slave doesn't know about it yet) executor.
> This problem is exacerbated during aurora failover because a slew of gc tasks
> are launched increasing the probability of the race across the cluster.
> While Aurora-608 will reduce the probability, coordinating the gc executor
> exit via aurora seems like the right solution.
--
This message was sent by Atlassian JIRA
(v6.2#6252)