Vincent Behar created MAPREDUCE-4867:
----------------------------------------
Summary: reduces tasks won't start in certain circumstances
Key: MAPREDUCE-4867
URL: https://issues.apache.org/jira/browse/MAPREDUCE-4867
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: scheduler
Affects Versions: 1.0.4
Reporter: Vincent Behar
Reduce tasks start are conditioned by the value of
"mapred.reduce.slowstart.completed.maps". However, if the number of completed
map tasks never reached the configured value (for example because
"mapred.max.map.failures.percent" has been set to a high value, to permit a job
to have a lot of failed tasks), then the reduce tasks won't start.
The job is still running, all map tasks are finished (either successful or
not), and all reduce tasks are still pending. The only thing one can do is to
kill the job.
There are 2 things that could be done :
- document the relation between "mapred.max.map.failures.percent" and
"mapred.reduce.slowstart.completed.maps" : we can say that the rule to follow
if you want to be sure that your reduce tasks will start is :
"mapred.reduce.slowstart.completed.maps * 100 < 100 -
mapred.max.map.failures.percent"
- fix JobInProgress.scheduleReduces() to return true if all map tasks are
finished
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira