[
https://issues.apache.org/jira/browse/HADOOP-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12499169
]
Devaraj Das commented on HADOOP-1431:
-------------------------------------
Doug, agree with you that this issue should be handled more generally as part
of HADOOP-1201 scheduled for 0.14. That's why i put a comment (the first
comment on this issue) to that effect when Owen raised the bug. I believe that
the sort progress reporting as is done today has been working fine for quite
some time (many months actually), and I can't remember what bug got introduced
there (sorry). The only reason why sort could get stuck is for reason of bad
user code in the Comparator and I am not convinced that we would have handled
that issue completely without handling the merge cases also.
On a side note, one problem that exists today is that the child Map/Reduce
processes sometimes (rarely on linux), for some reason, doesn't exit even after
the map/reduce method invocations are over (TaskRunner.run() doesn't exit, and
hence tracker.reportTaskFinished(t.getTaskId()) is not called and finally the
TaskTracker kills it after the timeout interval in the method
markUnresponsiveTasks).
But again, I am happy if we agree that we should look at this issue in more
detail for 0.14 *smile*
> Map tasks can't timeout for failing to call progress
> ----------------------------------------------------
>
> Key: HADOOP-1431
> URL: https://issues.apache.org/jira/browse/HADOOP-1431
> Project: Hadoop
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.13.0
> Reporter: Owen O'Malley
> Assigned To: Arun C Murthy
> Fix For: 0.13.0
>
> Attachments: HADOOP-1431_1_20070525.patch
>
>
> Currently the map task runner creates a thread that calls progress every
> second to keep the system from killing the map if the sort takes too long.
> This is the wrong approach, because it will cause stuck tasks to not be
> killed. The right solution is to have the sort call progress as it actually
> makes progress. This is part of what is going on in HADOOP-1374. A map gets
> stuck at 100% progress, but not done.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.