[
https://issues.apache.org/jira/browse/MYRIAD-18?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14993548#comment-14993548
]
Adam B commented on MYRIAD-18:
--
TASK_LOST can occur for many reasons, including a network partition or
lost/crashed agent. Generally this message implies that restarting the task may
be successful, as opposed to a TASK_FAILED/TASK_ERROR where a retry is
likely/guaranteed to fail again.
Other TASK_LOST scenarios:
- The scheduler driver is disconnected from the Mesos master at the time of an
acceptOffers (e.g. launchTasks) call from the scheduler.
- Accept/Launch call uses invalid/rescinded offers. (Maybe this should be a
TASK_ERROR?)
- Master asked to launch a task on an agent that has since been removed or
disconnected.
- Tried to reconcile a task unknown to Mesos.
- When a master discovers that a slave process has exited, it reports TASK_LOST
for any tasks from non-checkpointing frameworks.
- If an agent is shutdown/removed completely, then all tasks will report a
TASK_LOST.
- Upon agent reregistration, any tasks known by the master but unknown by the
agent will report TASK_LOST.
- Agent could not launch the task because it failed to unschedule directories
for garbage collection.
- If the task/executor uses persistent volumes unknown to the agent.
- If the agent is asked to run a task using an existing executor that is
terminating/terminated.
- Agent asked to killTask for an unrecognized executor.
- Executor reregistration timeout expired.
- Failed to update resources for executor container (e.g. grow to launch new
task).
- Container/executor preempted by QoS controller.
> staging - pending loop
> --
>
> Key: MYRIAD-18
> URL: https://issues.apache.org/jira/browse/MYRIAD-18
> Project: Myriad
> Issue Type: Bug
>Reporter: Maysam Yabandeh
>
> if staging task is lost for any reason it gets stuck in a staging-pending
> loop.
> case TASK_LOST:
> schedulerState.makeTaskPending(taskId);
> break;
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)