[jira] [Commented] (MAPREDUCE-4951) Container preemption interpreted as task failure

Bikas Saha (JIRA) Wed, 23 Jan 2013 10:23:13 -0800

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-4951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13560903#comment-13560903
 ]


Bikas Saha commented on MAPREDUCE-4951:
---------------------------------------

We might be digressing from this jira here. But I really dont think the 2-step 
approach is worth its complexity. The main scenario where it makes sense is 
when the task has an ability to checkpoint its work before getting preempted. I 
havent seen this capability outside of basic research prototypes. Its much 
simpler to have the preemption be an RM only action. We do need to fix the 
action and information loop so that AM's can get correct information about the 
infrastructure's actions.
                
> Container preemption interpreted as task failure
> ------------------------------------------------
>
>                 Key: MAPREDUCE-4951
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4951
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: applicationmaster, mr-am, mrv2
>    Affects Versions: 2.0.2-alpha
>            Reporter: Sandy Ryza
>            Assignee: Sandy Ryza
>         Attachments: MAPREDUCE-4951-1.patch, MAPREDUCE-4951-2.patch, 
> MAPREDUCE-4951.patch
>
>
> When YARN reports a completed container to the MR AM, it always interprets it 
> as a failure.  This can lead to a job failing because too many of its tasks 
> failed, when in fact they only failed because the scheduler preempted them.
> MR needs to recognize the special exit code value of -100 and interpret it as 
> a container being killed instead of a container failure.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4951) Container preemption interpreted as task failure

Reply via email to