[ 
https://issues.apache.org/jira/browse/HADOOP-3155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12620176#action_12620176
 ] 

Amareshwari Sriramadasu commented on HADOOP-3155:
-------------------------------------------------

{code}
"Map-events fetcher for all reduce tasks on 
tracker_HOST-6026:localhost.localdomain/127.0.0.1:53932" daemon prio=10 
tid=0x7ff5d800 nid=0x510b in Object.wait() [0x81e0a000..0x81e0a1b0]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x87981ee0> (a java.util.TreeMap)
        at java.lang.Object.wait(Object.java:485)
        at 
org.apache.hadoop.mapred.TaskTracker$MapEventsFetcherThread.run(TaskTracker.java:511)
        - locked <0x87981ee0> (a java.util.TreeMap)
{code}
The above _kill -QUIT_ output of the tasktracker shows that 
MapEventsFetcherThread is in WAITING state, which is abnormal with the scenario.

It says that the Thread is waiting at line number 511:
{code}
509            while (((fList = reducesInShuffle()).size()) == 0) {
510              try {
511                runningJobs.wait();
512              } catch (InterruptedException e) {
513                LOG.info("Shutting down: " + getName());
514                return;
515              }
516            }
{code}
MapEventsFetcherThread will wake up from here if there are any 
reducesInShuffle.  
One scenario is the thread saw that there are no reducesInShuffle and is 
waiting, and the notifications are missed. But, 
Since the reducers have fetched some map outputs, that says that the thread 
woke up and fetched some events. And since the reduces are still in shuffle, 
the fetcherThread should not go to wait again.

If I'm not missing something, there could be a bad data structure that is 
causing the issue.


> reducers stuck at shuffling 
> ----------------------------
>
>                 Key: HADOOP-3155
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3155
>             Project: Hadoop Core
>          Issue Type: Bug
>            Reporter: Runping Qi
>            Assignee: Amareshwari Sriramadasu
>         Attachments: events-job_200807311630_0007.txt, 
> hadoop-3155-logs.tar.gz, patch-3155-debug-0.16.txt, 
> patch-3155-debug-0.17.txt, task_200807311630_0007_r_000006_0.syslog.gz
>
>
> This happened with hadoop-0.16.2:
> In relatively small job (a few hundreds of mappers and reducers), reducers 
> were stuck at shuffling.
> I saw the lines like the following repeated hundreds of thousands of times 
> over a few hours:
> 2008-04-02 17:17:44,640 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200804021200_0337_r_000008_0 Need 2 map output(s)
> 2008-04-02 17:17:44,641 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200804021200_0337_r_000008_0: Got 0 new map-outputs & 0 obsolete 
> map-outputs from tasktracker and 0 map-outputs from previous failures
> 2008-04-02 17:17:44,641 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200804021200_0337_r_000008_0 Got 0 known map output location(s); 
> scheduling...
> 2008-04-02 17:17:44,641 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200804021200_0337_r_000008_0 Scheduled 0 of 0 known outputs (0 slow 
> hosts and 0 dup hosts)
> 2008-04-02 17:17:46,643 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200804021200_0337_r_000008_0 Need 2 map output(s)
> 2008-04-02 17:17:46,643 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200804021200_0337_r_000008_0: Got 0 new map-outputs & 0 obsolete 
> map-outputs from tasktracker and 0 map-outputs from previous failures
> 2008-04-02 17:17:46,643 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200804021200_0337_r_000008_0 Got 0 known map output location(s); 
> scheduling...
> 2008-04-02 17:17:46,643 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200804021200_0337_r_000008_0 Scheduled 0 of 0 known outputs (0 slow 
> hosts and 0 dup hosts)
> 2008-04-02 17:17:48,645 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200804021200_0337_r_000008_0 Need 2 map output(s)
> 2008-04-02 17:17:48,645 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200804021200_0337_r_000008_0: Got 0 new map-outputs & 0 obsolete 
> map-outputs from tasktracker and 0 map-outputs from previous failures
> 2008-04-02 17:17:48,645 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200804021200_0337_r_000008_0 Got 0 known map output location(s); 
> scheduling...
> 2008-04-02 17:17:48,645 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200804021200_0337_r_000008_0 Scheduled 0 of 0 known outputs (0 slow 
> hosts and 0 dup hosts)
> 2008-04-02 17:17:50,647 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200804021200_0337_r_000008_0 Need 2 map output(s)
> 2008-04-02 17:17:50,647 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200804021200_0337_r_000008_0: Got 0 new map-outputs & 0 obsolete 
> map-outputs from tasktracker and 0 map-outputs from previous failures
> 2008-04-02 17:17:50,647 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200804021200_0337_r_000008_0 Got 0 known map output location(s); 
> scheduling...
> 2008-04-02 17:17:50,647 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200804021200_0337_r_000008_0 Scheduled 0 of 0 known outputs (0 slow 
> hosts and 0 dup hosts)
> 2008-04-02 17:17:52,649 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200804021200_0337_r_000008_0 Need 2 map output(s)
> 2008-04-02 17:17:52,650 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200804021200_0337_r_000008_0: Got 0 new map-outputs & 0 obsolete 
> map-outputs from tasktracker and 0 map-outputs from previous failures
> 2008-04-02 17:17:52,650 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200804021200_0337_r_000008_0 Got 0 known map output location(s); 
> scheduling...
> 2008-04-02 17:17:52,650 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200804021200_0337_r_000008_0 Scheduled 0 of 0 known outputs (0 slow 
> hosts and 0 dup hosts)
> 2008-04-02 17:17:54,651 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200804021200_0337_r_000008_0 Need 2 map output(s)
> 2008-04-02 17:17:54,652 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200804021200_0337_r_000008_0: Got 0 new map-outputs & 0 obsolete 
> map-outputs from tasktracker and 0 map-outputs from previous failures
> 2008-04-02 17:17:54,652 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200804021200_0337_r_000008_0 Got 0 known map output location(s); 
> scheduling...
> 2008-04-02 17:17:54,652 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200804021200_0337_r_000008_0 Scheduled 0 of 0 known outputs (0 slow 
> hosts and 0 dup hosts)
> 2008-04-02 17:17:56,654 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200804021200_0337_r_000008_0 Need 2 map output(s)
> 2008-04-02 17:17:56,654 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200804021200_0337_r_000008_0: Got 0 new map-outputs & 0 obsolete 
> map-outputs from tasktracker and 0 map-outputs from previous failures
> 2008-04-02 17:17:56,654 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200804021200_0337_r_000008_0 Got 0 known map output location(s); 
> scheduling...
> 2008-04-02 17:17:56,654 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200804021200_0337_r_000008_0 Scheduled 0 of 0 known outputs (0 slow 
> hosts and 0 dup hosts)
> 2008-04-02 17:17:58,656 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200804021200_0337_r_000008_0 Need 2 map output(s)
> 2008-04-02 17:17:58,656 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200804021200_0337_r_000008_0: Got 0 new map-outputs & 0 obsolete 
> map-outputs from tasktracker and 0 map-outputs from previous failures
> 2008-04-02 17:17:58,656 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200804021200_0337_r_000008_0 Got 0 known map output location(s); 
> scheduling...
> 2008-04-02 17:17:58,656 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200804021200_0337_r_000008_0 Scheduled 0 of 0 known outputs (0 slow 
> hosts and 0 dup hosts)
> 2008-04-02 17:18:00,658 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200804021200_0337_r_000008_0 Need 2 map output(s)
> 2008-04-02 17:18:00,658 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200804021200_0337_r_000008_0: Got 0 new map-outputs & 0 obsolete 
> map-outputs from tasktracker and 0 map-outputs from previous failures
> 2008-04-02 17:18:00,658 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200804021200_0337_r_000008_0 Got 0 known map output location(s); 
> scheduling...
> 2008-04-02 17:18:00,658 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200804021200_0337_r_000008_0 Scheduled 0 of 0 known outputs (0 slow 
> hosts and 0 dup hosts)
> 2008-04-02 17:18:02,660 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200804021200_0337_r_000008_0 Need 2 map output(s)
> 2008-04-02 17:18:02,661 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200804021200_0337_r_000008_0: Got 0 new map-outputs & 0 obsolete 
> map-outputs from tasktracker and 0 map-outputs from previous failures
> 2008-04-02 17:18:02,661 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200804021200_0337_r_000008_0 Got 0 known map output location(s); 
> scheduling...
> 2008-04-02 17:18:02,661 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200804021200_0337_r_000008_0 Scheduled 0 of 0 known outputs (0 slow 
> hosts and 0 dup hosts)
> 2008-04-02 17:18:04,662 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200804021200_0337_r_000008_0 Need 2 map output(s)
> 2008-04-02 17:18:04,663 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200804021200_0337_r_000008_0: Got 0 new map-outputs & 0 obsolete 
> map-outputs from tasktracker and 0 map-outputs from previous failures
> 2008-04-02 17:18:04,663 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200804021200_0337_r_000008_0 Got 0 known map output location(s); 
> scheduling...
> 2008-04-02 17:18:04,663 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200804021200_0337_r_000008_0 Scheduled 0 of 0 known outputs (0 slow 
> hosts and 0 dup hosts)
> 2008-04-02 17:18:06,664 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200804021200_0337_r_000008_0 Need 2 map output(s)
> 2008-04-02 17:18:06,665 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200804021200_0337_r_000008_0: Got 0 new map-outputs & 0 obsolete 
> map-outputs from tasktracker and 0 map-outputs from previous failures
> 2008-04-02 17:18:06,665 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200804021200_0337_r_000008_0 Got 0 known map output location(s); 
> scheduling...
> 2008-04-02 17:18:06,665 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200804021200_0337_r_000008_0 Scheduled 0 of 0 known outputs (0 slow 
> hosts and 0 dup hosts)
> 2008-04-02 17:18:08,667 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200804021200_0337_r_000008_0 Need 2 map output(s)
> 2008-04-02 17:18:08,667 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200804021200_0337_r_000008_0: Got 0 new map-outputs & 0 obsolete 
> map-outputs from tasktracker and 0 map-outputs from previous failures
> 2008-04-02 17:18:08,667 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200804021200_0337_r_000008_0 Got 0 known map output location(s); 
> scheduling...
> 2008-04-02 17:18:08,667 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200804021200_0337_r_000008_0 Scheduled 0 of 0 known outputs (0 slow 
> hosts and 0 dup hosts)
> 2008-04-02 17:18:10,669 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200804021200_0337_r_000008_0 Need 2 map output(s)
> 2008-04-02 17:18:10,669 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200804021200_0337_r_000008_0: Got 0 new map-outputs & 0 obsolete 
> map-outputs from tasktracker and 0 map-outputs from previous failures
> 2008-04-02 17:18:10,669 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200804021200_0337_r_000008_0 Got 0 known map output location(s); 
> scheduling...
> 2008-04-02 17:18:10,669 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200804021200_0337_r_000008_0 Scheduled 0 of 0 known outputs (0 slow 
> hosts and 0 dup hosts)
> 2008-04-02 17:18:12,671 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200804021200_0337_r_000008_0 Need 2 map output(s)
> 2008-04-02 17:18:12,671 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200804021200_0337_r_000008_0: Got 0 new map-outputs & 0 obsolete 
> map-outputs from tasktracker and 0 map-outputs from previous failures
> 2008-04-02 17:18:12,671 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200804021200_0337_r_000008_0 Got 0 known map output location(s); 
> scheduling...
> 2008-04-02 17:18:12,671 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200804021200_0337_r_000008_0 Scheduled 0 of 0 known outputs (0 slow 
> hosts and 0 dup hosts)
> 2008-04-02 17:18:14,673 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200804021200_0337_r_000008_0 Need 2 map output(s)
> 2008-04-02 17:18:14,674 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200804021200_0337_r_000008_0: Got 0 new map-outputs & 0 obsolete 
> map-outputs from tasktracker and 0 map-outputs from previous failures
> 2008-04-02 17:18:14,674 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200804021200_0337_r_000008_0 Got 0 known map output location(s); 
> scheduling...
> 2008-04-02 17:18:14,674 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200804021200_0337_r_000008_0 Scheduled 0 of 0 known outputs (0 slow 
> hosts and 0 dup hosts)
> 2008-04-02 17:18:16,675 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200804021200_0337_r_000008_0 Need 2 map output(s)
> 2008-04-02 17:18:16,676 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200804021200_0337_r_000008_0: Got 0 new map-outputs & 0 obsolete 
> map-outputs from tasktracker and 0 map-outputs from previous failures
> 2008-04-02 17:18:16,676 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200804021200_0337_r_000008_0 Got 0 known map output location(s); 
> scheduling...
> 2008-04-02 17:18:16,676 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200804021200_0337_r_000008_0 Scheduled 0 of 0 known outputs (0 slow 
> hosts and 0 dup hosts)
> 2008-04-02 17:18:18,678 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200804021200_0337_r_000008_0 Need 2 map output(s)
> 2008-04-02 17:18:18,678 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200804021200_0337_r_000008_0: Got 0 new map-outputs & 0 obsolete 
> map-outputs from tasktracker and 0 map-outputs from previous failures
> 2008-04-02 17:18:18,678 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200804021200_0337_r_000008_0 Got 0 known map output location(s); 
> scheduling...
> 2008-04-02 17:18:18,678 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200804021200_0337_r_000008_0 Scheduled 0 of 0 known outputs (0 slow 
> hosts and 0 dup hosts)
> 2008-04-02 17:18:20,680 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200804021200_0337_r_000008_0 Need 2 map output(s)
> 2008-04-02 17:18:20,680 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200804021200_0337_r_000008_0: Got 0 new map-outputs & 0 obsolete 
> map-outputs from tasktracker and 0 map-outputs from previous failures
> 2008-04-02 17:18:20,680 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200804021200_0337_r_000008_0 Got 0 known map output location(s); 
> scheduling...
> 2008-04-02 17:18:20,680 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200804021200_0337_r_000008_0 Scheduled 0 of 0 known outputs (0 slow 
> hosts and 0 dup hosts)
> 2008-04-02 17:18:22,682 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200804021200_0337_r_000008_0 Need 2 map output(s)
> 2008-04-02 17:18:22,682 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200804021200_0337_r_000008_0: Got 0 new map-outputs & 0 obsolete 
> map-outputs from tasktracker and 0 map-outputs from previous failures
> 2008-04-02 17:18:22,682 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200804021200_0337_r_000008_0 Got 0 known map output location(s); 
> scheduling...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to