Zeke Harris created AURORA-1215:
-----------------------------------

             Summary: Improve gc_executor to better handle tasks stuck in 
STARTING state
                 Key: AURORA-1215
                 URL: https://issues.apache.org/jira/browse/AURORA-1215
             Project: Aurora
          Issue Type: Task
          Components: Executor
            Reporter: Zeke Harris


If a task is lost on a slave for some reason while the scheduler still thinks 
it's STARTING, the gc_executor doesn't know what to do and passes. It should 
instead probably let the scheduler know that the task should be transitioned to 
a different state (FAILED?).

Here's an example of an error log line with this happenning:
{code}I0320 07:22:01.281100 19634 executor_base.py:45] Executor 
[20150206-190136-2126263306-5050-29652-S51]: Know nothing about task 
1426024330051-mesos-test-oom-0-8e9c1594-fbba-4932-bb4e-140ce79100ad, but 
scheduler says STARTING - passing{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to