[jira] Commented: (HADOOP-1930) Too many fetch-failures issue

Devaraj Das (JIRA) Mon, 24 Sep 2007 06:05:15 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-1930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12529851
 ]


Devaraj Das commented on HADOOP-1930:
-------------------------------------

If a task is not found in any of the tasktrackers (getAssignedTracker returns 
null), then the patch declares the tasktracker as "unknown" in the message. 
From the readability point of view, it might make sense to declare the 
tasktracker as "lost" since that is the only case, after an earlier declaration 
by the JT that the task represented by that taskid was successful, when 
getAssignedTracker would return null.

> Too many fetch-failures issue
> -----------------------------
>
>                 Key: HADOOP-1930
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1930
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.15.0
>            Reporter: Christian Kunz
>            Assignee: Arun C Murthy
>            Priority: Blocker
>         Attachments: HADOOP-1930_1_20070922.patch
>
>
> A job with 4000 maps on a 1400 node cluster (3 tasks per node allowed) had a 
> lot (150) of 'Too many fetch-failures' map failures.
> From the jobtracker log it looks as if it got confused which tasktracker 
> actually ran the task:
> (In the following log output, I replaced the corresponding tasktracker nodes 
> with ***node_assigned*** and ***node_fetch_attempt** and they are different)
> grep task_200709170247_0018_m_000009_0 
> hadoop-xxx-jobtracker-node.log.2007-09-19:
> 2007-09-19 15:52:26,907 INFO org.apache.hadoop.mapred.JobTracker: Adding task 
> 'task_200709170247_0018_m_000009_0' to tip tip_200709170247_0018_m_000009, 
> for tracker 'tracker_***node_assigned_***:/127.0.0.1:54523'
> 2007-09-19 15:58:03,111 INFO org.apache.hadoop.mapred.TaskRunner: Saved 
> output of task 'task_200709170247_0018_m_000009_0' to hdfs://location
> 2007-09-19 15:58:03,111 INFO org.apache.hadoop.mapred.JobInProgress: Task 
> 'task_200709170247_0018_m_000009_0' has completed 
> tip_200709170247_0018_m_000009 successfully.
> 2007-09-19 15:58:03,111 INFO org.apache.hadoop.mapred.TaskInProgress: Task 
> 'task_200709170247_0018_m_000009_0' has completed succesfully
> 2007-09-19 16:21:07,825 INFO org.apache.hadoop.mapred.JobInProgress: Failed 
> fetch notification #1 for task task_200709170247_0018_m_000009_0
> 2007-09-19 16:23:23,483 INFO org.apache.hadoop.mapred.JobInProgress: Failed 
> fetch notification #2 for task task_200709170247_0018_m_000009_0
> 2007-09-19 16:25:07,182 INFO org.apache.hadoop.mapred.JobInProgress: Failed 
> fetch notification #3 for task task_200709170247_0018_m_000009_0
> 2007-09-19 16:25:07,182 INFO org.apache.hadoop.mapred.JobInProgress: Too many 
> fetch-failures for output of task: task_200709170247_0018_m_000009_0 ... 
> killing it
> 2007-09-19 16:25:07,182 INFO org.apache.hadoop.mapred.TaskInProgress: Error 
> from task_200709170247_0018_m_000009_0: Too many fetch-failures
> 2007-09-19 16:25:07,182 INFO org.apache.hadoop.mapred.TaskInProgress: Task 
> 'task_200709170247_0018_m_000009_0' has been lost.
> 2007-09-19 16:25:07,184 INFO org.apache.hadoop.mapred.JobTracker: Removed 
> completed task 'task_200709170247_0018_m_000009_0' from 
> 'tracker_***node_fetch_attempt***:/127.0.0.1:48818'
> 2007-09-19 21:40:00,235 INFO org.apache.hadoop.mapred.JobTracker: Removed 
> completed task 'task_200709170247_0018_m_000009_0' from 
> 'tracker_***node_fetch_attempt***:/127.0.0.1:48818'

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1930) Too many fetch-failures issue

Reply via email to