[jira] [Resolved] (MESOS-245) Hadoop framework sometimes won't rerun failed map tasks

Benjamin Mahler (JIRA) Thu, 07 Feb 2013 15:25:13 -0800

     [ 
https://issues.apache.org/jira/browse/MESOS-245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Benjamin Mahler resolved MESOS-245.
-----------------------------------

    Resolution: Won't Fix

We've recently completely re-written the hadoop scheduler / executor so this 
should no longer be an issue, can you confirm?
                
> Hadoop framework sometimes won't rerun failed map tasks
> -------------------------------------------------------
>
>                 Key: MESOS-245
>                 URL: https://issues.apache.org/jira/browse/MESOS-245
>             Project: Mesos
>          Issue Type: Bug
>          Components: framework
>            Reporter: Charles Reiss
>            Assignee: Charles Reiss
>
> There are two things which can occasionally cause the Mesos framework for 
> Hadoop to fail to run map tasks:
> - it looks for runnable map tasks by examining lists which are not updated 
> when a map task fails or is killed; when no non-failed/killed map tasks are 
> runnable, it will never attempt to launch a new map task. (If any are 
> runnable, it calls a normal Hadoop function to obtain the task, so it will 
> account for the rerunning task that way.); and
> - if all available resources are used by reduce tasks and map outputs needed 
> by those reduces become unusable, Hadoop will not be able to rerun the map 
> task(s) because it will not receive any suitable offers. A workaround for 
> this is to configure reduce-slots-per-machine limits such that the framework 
> never saturates all the resources with reduce tasks. A better fix would be 
> for the framework to detect the deadlock and kill a reduce task to resolve it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (MESOS-245) Hadoop framework sometimes won't rerun failed map tasks

Reply via email to