[ 
https://issues.apache.org/jira/browse/HADOOP-3915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12620938#action_12620938
 ] 

Devaraj Das commented on HADOOP-3915:
-------------------------------------

Andreas, the reason why kill failed is that you specified the TIP ID rather 
than the taskID. Regarding the reducers stuck on fetching events, it looks like 
this is a case of HADOOP-3155. We are trying to find the problem leading to 
this. In the meantime, if this happens again, could you pls let us know the 
output of
$ bin/hadoop job -events <jobId> 0 1000000

Also, pls apply the patch posted on HADOOP-3155 
(https://issues.apache.org/jira/secure/attachment/12387565/patch-3155-debug-0.17.txt).
 Please watch for when this happens again, and if it happens again, please 
attach a log of the tasktracker (provide logs of two such trackers if you can) 
on which the reduces get stuck, and a log of the tasktracker (just one is 
enough) on which reduces succeed.

I did notice from your logs that the JT got restarted while the TTs were up and 
I wonder whether that is somehow related to the issue described in HADOOP-3155.

> reducers hang, jobtracker loosing completely track of them.
> -----------------------------------------------------------
>
>                 Key: HADOOP-3915
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3915
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.17.1
>         Environment: EC2, Debian Etch  (but not the ec2-contrib stuff)
> streaming.jar
>            Reporter: Andreas Kostyrka
>             Fix For: 0.17.2
>
>         Attachments: 
> hadoop-hadoop-jobtracker-ec2-67-202-58-97.compute-1.amazonaws.com.log
>
>
> I just noticed the following curious situation:
> -) 18 of 22 reducers are waiting for 3 hours or so with 0.01MB/s and no 
> progress.
> -) hadoop job -kill-task does not work on the ids shown
> -) killing all reduce work tasks (the spawned Python processes, not java 
> TaskTracker$Child) gets completely ignored by the JobTracker, the jobtracker 
> shows them still as running.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to