[
https://issues.apache.org/jira/browse/HADOOP-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12500778
]
Raghu Angadi commented on HADOOP-1431:
--------------------------------------
Doug's comment that was posted to HADOOP-1134 by mistake:
{quote}
Calvin Yu noted on hadoop-user that join() seems to sometimes hang even if the
thread has been interrupted. In other places we use the idiom of a 'running'
flag that's checked in a thread's loop in conjunction with an interrupt, rather
than interrupt+join, and that seems to be reliable. So I think we should switch
to that here to.
Also, in the current patch, I don't see why the thread is held in a field. I
worry that someone might add code like 'if (sortProgressThread == null) ...',
and that we might somehow not always null this field. If it is kept in a local
variable around the call then this is much less of a risk.
So I think we should convert the createProgressThread method to a nested class
whose constructor starts the thread and which has a stop() method that sets a
flag. It would also be good if the 'try' block could be shared between
'collect()' and 'flush()'. I think this calls for a new method something like:
private void sortWithProgress() {
ProgressThread progress = new ProgressThread();
try { sortAndSpillToDisk(); } finally { progress.stop(); }
}
{quote}
> Map tasks can't timeout for failing to call progress
> ----------------------------------------------------
>
> Key: HADOOP-1431
> URL: https://issues.apache.org/jira/browse/HADOOP-1431
> Project: Hadoop
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.13.0
> Reporter: Owen O'Malley
> Assignee: Arun C Murthy
> Priority: Blocker
> Fix For: 0.13.0
>
> Attachments: HADOOP-1431_1_20070525.patch,
> HADOOP-1431_2_20070530.patch, HADOOP-1431_3_20070601.patch
>
>
> Currently the map task runner creates a thread that calls progress every
> second to keep the system from killing the map if the sort takes too long.
> This is the wrong approach, because it will cause stuck tasks to not be
> killed. The right solution is to have the sort call progress as it actually
> makes progress. This is part of what is going on in HADOOP-1374. A map gets
> stuck at 100% progress, but not done.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.