[
https://issues.apache.org/jira/browse/HADOOP-2327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12547625
]
Owen O'Malley commented on HADOOP-2327:
---------------------------------------
I think this should more general that streaming and be in the general
framework. I'd propose a solution that allows the job to control which maps and
reduces are run by number using two configuration parameters.
-Dmapred.map.only-run=1,20-100,103
-Dmapred.reduce.only-run=4
would run map1, 20, 21, ..., 100, and 103 and reduce 4.
Would that meet your requirements, Arkady?
> Streaming: need to be able to re-run specific map tasks (when -reducer NONE)
> ----------------------------------------------------------------------------
>
> Key: HADOOP-2327
> URL: https://issues.apache.org/jira/browse/HADOOP-2327
> Project: Hadoop
> Issue Type: Improvement
> Components: contrib/streaming
> Reporter: arkady borkovsky
>
> Sometimes, a few map tasks fail and -reducer NONE.
> It should be possible to rerun the failed map tasks .
> There are several failure modes:
> * a task is hanging, so the job is killed
> * from the infrastructure perspective, the task has completed successfully
> , but it failed to produces correct result
> * failed in the proper Hadoop sense
> It is often too expensive to rerun the whole job. And for larger jobs,
> chances are each run will have a few failed tasks.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.