[
https://issues.apache.org/jira/browse/MESOS-245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Benjamin Mahler resolved MESOS-245.
-----------------------------------
Resolution: Won't Fix
We've recently completely re-written the hadoop scheduler / executor so this
should no longer be an issue, can you confirm?
> Hadoop framework sometimes won't rerun failed map tasks
> -------------------------------------------------------
>
> Key: MESOS-245
> URL: https://issues.apache.org/jira/browse/MESOS-245
> Project: Mesos
> Issue Type: Bug
> Components: framework
> Reporter: Charles Reiss
> Assignee: Charles Reiss
>
> There are two things which can occasionally cause the Mesos framework for
> Hadoop to fail to run map tasks:
> - it looks for runnable map tasks by examining lists which are not updated
> when a map task fails or is killed; when no non-failed/killed map tasks are
> runnable, it will never attempt to launch a new map task. (If any are
> runnable, it calls a normal Hadoop function to obtain the task, so it will
> account for the rerunning task that way.); and
> - if all available resources are used by reduce tasks and map outputs needed
> by those reduces become unusable, Hadoop will not be able to rerun the map
> task(s) because it will not receive any suitable offers. A workaround for
> this is to configure reduce-slots-per-machine limits such that the framework
> never saturates all the resources with reduce tasks. A better fix would be
> for the framework to detect the deadlock and kill a reduce task to resolve it.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira