[jira] Commented: (HADOOP-1431) Map tasks can't timeout for failing to call progress

Vivek Ratan (JIRA) Tue, 29 May 2007 05:57:39 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12499786
 ]


Vivek Ratan commented on HADOOP-1431:
-------------------------------------

As part of a good solution (for 0.14 or later), I think we should separate out 
reporting of progress by the sort/merge/user code and reporting progress from 
the Task to the Task Tracker. 

For the former, we make the Reporter object available to the MapReduce kernel 
code, as Devaraj suggested, and at other appropriate places as discussed in 
this conversation. Wherever progress is made that we need to report (during 
sort or merge or whatever), the kernel code or the user's code calls the 
Reporter project. 

Separately, for the latter, we probably should continue with the Progress 
thread. This thread looks at the Progress data structures and sends progress 
info to the TaskTracker via RPC. To avoid the problem that this bug was filed 
for, we have two likely options: 
1. The thread continuus doing what it is doing is: it sends the progress 
information at regular intervals and the TaskTracker decides whether the task 
has really made progress, based on what it got earlier. Or
2. The thread decides whether progress has really been made and makes an RPC 
call only if necessary. Even if progress is not made, it may make a call if we 
eliminate the Ping thread (see issue 1201) to prevent the TaskTracker from 
killing the task. 

The latter's probably a better option as the logic to decide whether progress 
has been made may be easier to implement in the thread, rather than in 
TaskTracker. As discussed earlier in this conversation, we may resume/suspend 
the thread, or at least make sure we start and stop it at the right places But 
I'd suggest we separate the issue of reporting progress locally (via the 
Reporter object) with reporting progress to the TaskTracker (via a thread). The 
logic for the two issues is diferent and separating the code will make things 
cleaner and easier to change. 

> Map tasks can't timeout for failing to call progress
> ----------------------------------------------------
>
>                 Key: HADOOP-1431
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1431
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.13.0
>            Reporter: Owen O'Malley
>            Assignee: Arun C Murthy
>             Fix For: 0.13.0
>
>         Attachments: HADOOP-1431_1_20070525.patch
>
>
> Currently the map task runner creates a thread that calls progress every 
> second to keep the system from killing the map if the sort takes too long. 
> This is the wrong approach, because it will cause stuck tasks to not be 
> killed. The right solution is to have the sort call progress as it actually 
> makes progress. This is part of what is going on in HADOOP-1374. A map gets 
> stuck at 100% progress, but not done.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1431) Map tasks can't timeout for failing to call progress

Reply via email to