[jira] [Commented] (MYRIAD-18) staging - pending loop

2015-11-06 Thread Adam B (JIRA)

[ 
https://issues.apache.org/jira/browse/MYRIAD-18?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14993548#comment-14993548
 ] 

Adam B commented on MYRIAD-18:
--

TASK_LOST can occur for many reasons, including a network partition or 
lost/crashed agent. Generally this message implies that restarting the task may 
be successful, as opposed to a TASK_FAILED/TASK_ERROR where a retry is 
likely/guaranteed to fail again.
Other TASK_LOST scenarios:
- The scheduler driver is disconnected from the Mesos master at the time of an 
acceptOffers (e.g. launchTasks) call from the scheduler.
- Accept/Launch call uses invalid/rescinded offers. (Maybe this should be a 
TASK_ERROR?)
- Master asked to launch a task on an agent that has since been removed or 
disconnected.
- Tried to reconcile a task unknown to Mesos.
- When a master discovers that a slave process has exited, it reports TASK_LOST 
for any tasks from non-checkpointing frameworks.
- If an agent is shutdown/removed completely, then all tasks will report a 
TASK_LOST.
- Upon agent reregistration, any tasks known by the master but unknown by the 
agent will report TASK_LOST.
- Agent could not launch the task because it failed to unschedule directories 
for garbage collection.
- If the task/executor uses persistent volumes unknown to the agent.
- If the agent is asked to run a task using an existing executor that is 
terminating/terminated.
- Agent asked to killTask for an unrecognized executor.
- Executor reregistration timeout expired.
- Failed to update resources for executor container (e.g. grow to launch new 
task).
- Container/executor preempted by QoS controller.

> staging - pending loop
> --
>
> Key: MYRIAD-18
> URL: https://issues.apache.org/jira/browse/MYRIAD-18
> Project: Myriad
>  Issue Type: Bug
>Reporter: Maysam Yabandeh
>
> if staging task is lost for any reason it gets stuck in a staging-pending 
> loop.
> case TASK_LOST:
> schedulerState.makeTaskPending(taskId);
> break;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MYRIAD-18) staging - pending loop

2015-10-22 Thread Santosh Marella (JIRA)

[ 
https://issues.apache.org/jira/browse/MYRIAD-18?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14968921#comment-14968921
 ] 

Santosh Marella commented on MYRIAD-18:
---

The task in the question is a Node Manager. I think a TASK_LOST can be received 
if the Mesos agent dies while the task is being launched. In that case, I think 
it's reasonable to move the NM task to pending and have it relaunched on 
another Mesos agent node.

[~adam-mesos], thoughts?

> staging - pending loop
> --
>
> Key: MYRIAD-18
> URL: https://issues.apache.org/jira/browse/MYRIAD-18
> Project: Myriad
>  Issue Type: Bug
>Reporter: Maysam Yabandeh
> Fix For: Myriad 0.1.0
>
>
> if staging task is lost for any reason it gets stuck in a staging-pending 
> loop.
> case TASK_LOST:
> schedulerState.makeTaskPending(taskId);
> break;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MYRIAD-18) staging - pending loop

2015-10-21 Thread Yuliya Feldman (JIRA)

[ 
https://issues.apache.org/jira/browse/MYRIAD-18?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14967852#comment-14967852
 ] 

Yuliya Feldman commented on MYRIAD-18:
--

I believe now you can flex down those, since we changed the order and added 
profile of NM you are flexing down to flex down pending, staging and only then 
active tasks.

> staging - pending loop
> --
>
> Key: MYRIAD-18
> URL: https://issues.apache.org/jira/browse/MYRIAD-18
> Project: Myriad
>  Issue Type: Bug
>Reporter: Maysam Yabandeh
> Fix For: Myriad 0.1.0
>
>
> if staging task is lost for any reason it gets stuck in a staging-pending 
> loop.
> case TASK_LOST:
> schedulerState.makeTaskPending(taskId);
> break;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)