[ 
https://issues.apache.org/jira/browse/MAPREDUCE-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13217228#comment-13217228
 ] 

Amar Kamat commented on MAPREDUCE-3926:
---------------------------------------

Mitesh,
I guess adding this to 0.20.205 might involve a lot of change. Also, the JT has 
no information about the running tasks i.e they could in fact be RUNNING, 
KILLED, FAILED, PENDING etc.

Note that this can happen for SUCCESSFUL jobs too. The job can still 
complete/finish while the speculative tasks are running. In such cases, there 
is no information about the speculative tasks logged in the job history.

This can surely be fixed in trunk.
                
> No information of unfinished map task in Job History, if all attempts of 
> another map task fail.
> -----------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3926
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3926
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobtracker
>    Affects Versions: 0.20.205.0
>            Reporter: Mitesh Singh Jat
>            Priority: Minor
>
> No information of unfinished map task in Job History, if all attempts of 
> another map task fail.
> For example, 
> 1. The first map task's first attempt m_000000_0 was making progress
> 2. The second map task failed 4 times, before completion of first map task 
> attempt.
> 3. Hence, a job cleanup task was launched and completed, before completion of 
> first map task attempt.
> 4. After job cleanup task, runningMapCache is cleaned
> {noformat}
> completedTask() -> jobComplete() -> garbageCollect() ->  this.runningMapCache 
> = null;
>            |-----> retireMap() -> if (runningMapCache == null) "Running cache 
> for maps missing!! Job details are missing."
> {noformat}
> 5. Hence, "Running cache for maps missing!! Job details are missing." error 
> comes
> (from retireMap() which is called after jobComplete() ) and no information is
> added further to Job History. Therefore, first map task's information is
> missing from Job History page.
> I have created a sample streaming MR job, to reproduce this issue.
> {code:title=mapper.sh}
> #!/bin/bash
> read line
> if [[ "$line" == "sleep" ]]
> then
>     for i in 1 2 3
>     do
>         echo "Sleeping" >&2
>         sleep 5
>     done
>     exit 0
> else
>     echo "Exiting" >&2
>     exit -1
> fi
> {code}
> Input file: in1.txt is for long running map task (here first map task)
> {code:title=/user/mitesh/input/in1.txt}
> sleep
> {code}
> Input file: in2.txt is for failing map task (here second map task)
> {code:title=/user/mitesh/input/in2.txt}
> exit
> {code}
> Running the sample streaming MR job.
> {noformat}
> $ hadoop fs -rmr -skipTrash xyz
> $ hadoop fs -jar $HADOOP_INSTALL/hadoop-streaming.jar 
> -Dmapred.map.max.attempts=2 -Dmapred.min.split.size=7 -Dmapred.map.tasks=2 
> -mapper "mapper.sh" -file mapper.sh -reducer NONE -input 
> /user/mitesh/input/in1.txt -input /user/mitesh/input/in2.txt -output xyz
> {noformat}
> Job History web UI
> {noformat}
> Hadoop Job job_201201310454_542302 on History Viewer
> User: mitesh
> JobName: streamjob7439640883203077520.jar
> JobConf: hdfs://nn:port/user/mitesh/.staging/job_201201310454_542302/job.xml
> Job-ACLs:
>     mapreduce.job.acl-view-job: No users are allowed
>     mapreduce.job.acl-modify-job: No users are allowed
> Submitted At: 27-Feb-2012 12:56:02
> Launched At: 27-Feb-2012 12:56:11 (8sec)
> Finished At: 27-Feb-2012 12:56:31 (20sec)
> Status: FAILED
> Failure Info: # of failed Map Tasks exceeded allowed limit. FailedCount: 1. 
> LastFailedTask: task_201201310454_542302_m_000001
> Analyse This Job
> Kind  Total Tasks(successful+failed+killed)   Successful tasks        Failed 
> tasks    Killed tasks    Start Time      Finish Time
> Setup         1       1       0       0       27-Feb-2012 12:56:12    
> 27-Feb-2012 12:56:16 (4sec)
> Map   2       0       2       0       27-Feb-2012 12:56:16    27-Feb-2012 
> 12:56:26 (10sec)
> Reduce        0       0       0       0               
> Cleanup       1       1       0       0       27-Feb-2012 12:56:26    
> 27-Feb-2012 12:56:31 (4sec)
> {noformat}
> Above it shows, only 2 failed tasks (belong to second map task).
> Only from JT logs, the task tracker of first map task can be found.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to