[
https://issues.apache.org/jira/browse/MAPREDUCE-969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756299#action_12756299
]
Todd Lipcon commented on MAPREDUCE-969:
---------------------------------------
Hi Jothi,
This occurred again on the same cluster - same symptoms/traces/etc, so not
uploading anything new. I looked through the tasktracker logs and was unable to
find anything suspicious. The patch in HADOOP-4744 (r772846) is not in 0.20.0,
so we don't have the port checks or info printouts.
It does appear to be very similar to HADOOP, though - the TT with the -1 port
(xx28 in this case) ran several tasks before it was eventually shut down by
ops. All of the jobs that had map tasks run on xx28 eventually went into this
state.
We'll apply the second patch from HADOOP-4744 on this cluster and report back
whether it solves the problem. Feeling pretty good that it will.
> NullPointerException during reduce freezes job
> ----------------------------------------------
>
> Key: MAPREDUCE-969
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-969
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: jobtracker, task, tasktracker
> Affects Versions: 0.20.2
> Reporter: Todd Lipcon
> Assignee: Todd Lipcon
> Attachments: bad_job_events, bad_job_jt_logs, reduce_task_logs
>
>
> We experienced several jobs stuck in Reduce on a cluster. All of the stuck
> reduce tasks had a similar were stuck at "Need another 2 map output(s) where
> 0 is already in progress" despite all of the mappers having completed, and 0
> scheduled. The stuck reducers had experienced the following exception early
> in the shuffle:
> java.lang.NullPointerException
> at
> java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:768)
> at
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$GetMapEventsThread.getMapCompletionEvents(ReduceTask.java:2747)
> at
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$GetMapEventsThread.run(ReduceTask.java:2670)
> Will attach more information and logs momentarily.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.