[
https://issues.apache.org/jira/browse/HADOOP-3915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12620938#action_12620938
]
Devaraj Das commented on HADOOP-3915:
-------------------------------------
Andreas, the reason why kill failed is that you specified the TIP ID rather
than the taskID. Regarding the reducers stuck on fetching events, it looks like
this is a case of HADOOP-3155. We are trying to find the problem leading to
this. In the meantime, if this happens again, could you pls let us know the
output of
$ bin/hadoop job -events <jobId> 0 1000000
Also, pls apply the patch posted on HADOOP-3155
(https://issues.apache.org/jira/secure/attachment/12387565/patch-3155-debug-0.17.txt).
Please watch for when this happens again, and if it happens again, please
attach a log of the tasktracker (provide logs of two such trackers if you can)
on which the reduces get stuck, and a log of the tasktracker (just one is
enough) on which reduces succeed.
I did notice from your logs that the JT got restarted while the TTs were up and
I wonder whether that is somehow related to the issue described in HADOOP-3155.
> reducers hang, jobtracker loosing completely track of them.
> -----------------------------------------------------------
>
> Key: HADOOP-3915
> URL: https://issues.apache.org/jira/browse/HADOOP-3915
> Project: Hadoop Core
> Issue Type: Bug
> Affects Versions: 0.17.1
> Environment: EC2, Debian Etch (but not the ec2-contrib stuff)
> streaming.jar
> Reporter: Andreas Kostyrka
> Fix For: 0.17.2
>
> Attachments:
> hadoop-hadoop-jobtracker-ec2-67-202-58-97.compute-1.amazonaws.com.log
>
>
> I just noticed the following curious situation:
> -) 18 of 22 reducers are waiting for 3 hours or so with 0.01MB/s and no
> progress.
> -) hadoop job -kill-task does not work on the ids shown
> -) killing all reduce work tasks (the spawned Python processes, not java
> TaskTracker$Child) gets completely ignored by the JobTracker, the jobtracker
> shows them still as running.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.