[jira] [Updated] (MAPREDUCE-2129) Job may hang if mapreduce.job.committer.setup.cleanup.needed=false and mapreduce.map/reduce.failures.maxpercent>0

Subroto Sanyal (JIRA) Tue, 09 Aug 2011 04:06:08 -0700

     [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Subroto Sanyal updated MAPREDUCE-2129:
--------------------------------------

    Labels: hadoop  (was: )
    Status: Patch Available  (was: Open)

Aaah....got the problem  :-)
The root cause for problem is:
Before scheduling reduces a check is made in JobInProgress: *boolean 
org.apache.hadoop.mapred.JobInProgress.scheduleReduces()* which makes the 
following check:
*finishedMapTasks >= completedMapsForReduceSlowstart*
This check is valid if the value of *mapred.max.map.failures.percent* is set to 
0(zero/default value) but, when this value is set, the a/m check is invalid.
Say for example a Job spawns 100 Map task and the property value for 
mapred.max.map.failures.percent is set to 5 percent. In this scenario even if 
95 Maps are successful, reducers should be scheduled. Now if we look back the 
a/m check, then the condition will not satisfy ever(if 5 map task fail) because
95 >= 100 will be always false.

As per my understanding the issue has nothing to with 
*mapreduce.job.committer.setup.cleanup.needed*.

Kang,

We can't call the *void org.apache.hadoop.mapred.JobInProgress.jobComplete()* 
from the *void org.apache.hadoop.mapred.JobInProgress.failedTask(TaskInProgress 
tip, TaskAttemptID taskid, TaskStatus status, TaskTracker taskTracker, boolean 
wasRunning, boolean wasComplete, boolean wasAttemptRunning)* as the method will 
called upon failure of a Task but, we need to wait till 
*completedMapsForReduceSlowstart* is reached,so that reduces are spawned.

In my scenario there was a job(wordcount) with 111 Mappers.
The value of *mapred.reduce.slowstart.completed.maps* was set to 1 (100%). The 
value for *mapreduce.map.failures.maxpercent* was set to 5(5%).The Mapper 
implementation was tweaked in such a way that 4 mappers failed.
After some time I noticed that 107 mappers got completed but, reduces are not 
running and it got stuck for indefinite time.

> Job may hang if mapreduce.job.committer.setup.cleanup.needed=false and 
> mapreduce.map/reduce.failures.maxpercent>0
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2129
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2129
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobtracker
>    Affects Versions: 0.21.0, 0.20.2, 0.20.1, 0.20.3, 0.21.1, 0.22.0
>            Reporter: Kang Xiao
>              Labels: hadoop
>         Attachments: MAPREDUCE-2129.patch, MAPREDUCE-2129.patch
>
>
> Job may hang at RUNNING state if 
> mapreduce.job.committer.setup.cleanup.needed=false and 
> mapreduce.map/reduce.failures.maxpercent>0. It happens when some tasks fail 
> but havent reached failures.maxpercent.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-2129) Job may hang if mapreduce.job.committer.setup.cleanup.needed=false and mapreduce.map/reduce.failures.maxpercent>0

Reply via email to