Vinod Kone created AURORA-652:
---------------------------------

             Summary: GC Executor termination should be co-ordinated by the 
scheduler
                 Key: AURORA-652
                 URL: https://issues.apache.org/jira/browse/AURORA-652
             Project: Aurora
          Issue Type: Bug
            Reporter: Vinod Kone


Currently there is no co-ordination between when the gc executor exits and when 
a gc task is launched. This results in a race where a slave might launch a task 
on that just exited (but the slave doesn't know about it yet) executor.

This problem is exacerbated during aurora failover because a slew of gc tasks 
are launched increasing the probability of the race across the cluster. At the 
current time this is triggering our LOST tasks alerts whenever a scheduler 
failover happens. While Aurora-608 will reduce the probability, coordinating 
the gc executor exit via aurora seems like the right solution (proposed in 
AURORA-973 but implemented AFAICT).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to