Vinod Kone created AURORA-652:
---------------------------------
Summary: GC Executor termination should be co-ordinated by the
scheduler
Key: AURORA-652
URL: https://issues.apache.org/jira/browse/AURORA-652
Project: Aurora
Issue Type: Bug
Reporter: Vinod Kone
Currently there is no co-ordination between when the gc executor exits and when
a gc task is launched. This results in a race where a slave might launch a task
on that just exited (but the slave doesn't know about it yet) executor.
This problem is exacerbated during aurora failover because a slew of gc tasks
are launched increasing the probability of the race across the cluster. At the
current time this is triggering our LOST tasks alerts whenever a scheduler
failover happens. While Aurora-608 will reduce the probability, coordinating
the gc executor exit via aurora seems like the right solution (proposed in
AURORA-973 but implemented AFAICT).
--
This message was sent by Atlassian JIRA
(v6.2#6252)