[ 
https://issues.apache.org/jira/browse/HADOOP-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12499169
 ] 

Devaraj Das commented on HADOOP-1431:
-------------------------------------

Doug, agree with you that this issue should be handled more generally as part 
of HADOOP-1201 scheduled for 0.14. That's why i put a comment (the first 
comment on this issue) to that effect when Owen raised the bug. I believe that 
the sort progress reporting as is done today has been working fine for quite 
some time (many months actually), and I can't remember what bug got introduced 
there (sorry). The only reason why sort could get stuck is for reason of bad 
user code in the Comparator and I am not convinced that we would have handled 
that issue completely without handling the merge cases also. 
On a side note, one problem that exists today is that the child Map/Reduce 
processes sometimes (rarely on linux), for some reason, doesn't exit even after 
the map/reduce method invocations are over (TaskRunner.run() doesn't exit, and 
hence tracker.reportTaskFinished(t.getTaskId()) is not called and finally the 
TaskTracker kills it after the timeout interval in the method 
markUnresponsiveTasks).
But again, I am happy if we agree that we should look at this issue in more 
detail for 0.14 *smile*

> Map tasks can't timeout for failing to call progress
> ----------------------------------------------------
>
>                 Key: HADOOP-1431
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1431
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.13.0
>            Reporter: Owen O'Malley
>         Assigned To: Arun C Murthy
>             Fix For: 0.13.0
>
>         Attachments: HADOOP-1431_1_20070525.patch
>
>
> Currently the map task runner creates a thread that calls progress every 
> second to keep the system from killing the map if the sort takes too long. 
> This is the wrong approach, because it will cause stuck tasks to not be 
> killed. The right solution is to have the sort call progress as it actually 
> makes progress. This is part of what is going on in HADOOP-1374. A map gets 
> stuck at 100% progress, but not done.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to