[
https://issues.apache.org/jira/browse/HADOOP-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12499786
]
Vivek Ratan commented on HADOOP-1431:
-------------------------------------
As part of a good solution (for 0.14 or later), I think we should separate out
reporting of progress by the sort/merge/user code and reporting progress from
the Task to the Task Tracker.
For the former, we make the Reporter object available to the MapReduce kernel
code, as Devaraj suggested, and at other appropriate places as discussed in
this conversation. Wherever progress is made that we need to report (during
sort or merge or whatever), the kernel code or the user's code calls the
Reporter project.
Separately, for the latter, we probably should continue with the Progress
thread. This thread looks at the Progress data structures and sends progress
info to the TaskTracker via RPC. To avoid the problem that this bug was filed
for, we have two likely options:
1. The thread continuus doing what it is doing is: it sends the progress
information at regular intervals and the TaskTracker decides whether the task
has really made progress, based on what it got earlier. Or
2. The thread decides whether progress has really been made and makes an RPC
call only if necessary. Even if progress is not made, it may make a call if we
eliminate the Ping thread (see issue 1201) to prevent the TaskTracker from
killing the task.
The latter's probably a better option as the logic to decide whether progress
has been made may be easier to implement in the thread, rather than in
TaskTracker. As discussed earlier in this conversation, we may resume/suspend
the thread, or at least make sure we start and stop it at the right places But
I'd suggest we separate the issue of reporting progress locally (via the
Reporter object) with reporting progress to the TaskTracker (via a thread). The
logic for the two issues is diferent and separating the code will make things
cleaner and easier to change.
> Map tasks can't timeout for failing to call progress
> ----------------------------------------------------
>
> Key: HADOOP-1431
> URL: https://issues.apache.org/jira/browse/HADOOP-1431
> Project: Hadoop
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.13.0
> Reporter: Owen O'Malley
> Assignee: Arun C Murthy
> Fix For: 0.13.0
>
> Attachments: HADOOP-1431_1_20070525.patch
>
>
> Currently the map task runner creates a thread that calls progress every
> second to keep the system from killing the map if the sort takes too long.
> This is the wrong approach, because it will cause stuck tasks to not be
> killed. The right solution is to have the sort call progress as it actually
> makes progress. This is part of what is going on in HADOOP-1374. A map gets
> stuck at 100% progress, but not done.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.