[ 
https://issues.apache.org/jira/browse/HADOOP-924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley resolved HADOOP-924.
----------------------------------

       Resolution: Duplicate
    Fix Version/s: 0.12.1

This looks like a duplicate of HADOOP-1060.

> Map task is not getting rescheduled although the corresponding TT got lost
> --------------------------------------------------------------------------
>
>                 Key: HADOOP-924
>                 URL: https://issues.apache.org/jira/browse/HADOOP-924
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Devaraj Das
>             Fix For: 0.12.1
>
>
> I encountered this "job hung" situation during one of the sort runs. Two 
> tasks assigned to a TT were never rescheduled although the TT was lost and 
> this led to the job getting stuck forever. This TT was assigned lots of tasks 
> and everyone got rescheduled except these two. Here are the relevant log 
> messages (below the JT logs has been split into two parts to bring out the 
> sequence of events) for one of the tasks.
> JT log:
> ---------
> 2007-01-24 10:53:09,564 INFO org.apache.hadoop.mapred.JobInProgress: Choosing 
> normal task tip_0001_m_020699
> 2007-01-24 10:53:09,564 INFO org.apache.hadoop.mapred.JobTracker: Adding task 
> 'task_0001_m_020699_0' to tip tip_0001_m_020699, for tracker 'foo.com:7020'
> TT log:
> ---------
> 2007-01-24 10:53:09,564 INFO org.apache.hadoop.mapred.TaskTracker: 
> LaunchTaskAction: task_0001_m_020699_0
> 2007-01-24 10:53:12,180 INFO org.apache.hadoop.mapred.TaskTracker: 
> task_0001_m_020699_0 0.0% 
> hdfs://foo:50000/user/ddas/somedir/part002444:134217728+134217728
> JT log:
> ---------
> 2007-01-24 11:05:32,409 INFO org.apache.hadoop.mapred.JobTracker: Lost 
> tracker 'foo.com:7020'
> Looks like there is some race condition. Since only two out of the many tasks 
> never got rescheduled,  could mean that the JT was somehow unaware of the 
> state of this two tasks after it assigned them to the (soon-to-be-lost) TT 
> (did they get added to the relevant tables properly?).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to