[ 
https://issues.apache.org/jira/browse/HADOOP-2119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12579813#action_12579813
 ] 

Amar Kamat commented on HADOOP-2119:
------------------------------------

Some comments on the attached patch
1) It uses hostname to detect if the tip failed on a machine as compared to 
tracker-name. This becomes an issue if there are two trackers on a same node 
e.g ant tests. This is the reason why some of the tests failed.
2) The list of ancestors maintained at the JT can be incomplete leading to 
stuck jobs. This can happen if the nodes have just the datanodes and no 
trackers.
3) isJobComplete logic is broken. It should also consider failed TIPs. 
----
Also,  {{JobInProgress.isJobComplete()}} now depends on {{failedMapTIPs}} and 
{{failedReduceTIPs}}. The patch fixes the update to 
{{failedMapTIPs/failedReduceTIPs}} in {{failedTask}} since it was broken (in 
cases where a TIP has a speculative task). 


> JobTracker becomes non-responsive if the task trackers finish task too fast
> ---------------------------------------------------------------------------
>
>                 Key: HADOOP-2119
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2119
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.16.0
>            Reporter: Runping Qi
>            Assignee: Amar Kamat
>            Priority: Critical
>             Fix For: 0.17.0
>
>         Attachments: HADOOP-2119-v4.1.patch, hadoop-2119.patch, 
> hadoop-jobtracker-thread-dump.txt
>
>
> I ran a job with 0 reducer on a cluster with 390 nodes.
> The mappers ran very fast.
> The jobtracker lacks behind on committing completed mapper tasks.
> The number of running mappers displayed on web UI getting bigger and bigger.
> The jos tracker eventually stopped responding to web UI.
> No progress is reported afterwards.
> Job tracker is running on a separate node.
> The job tracker process consumed 100% cpu, with vm size 1.01g (reach the heap 
> space limit).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to