[
https://issues.apache.org/jira/browse/HADOOP-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Arun C Murthy updated HADOOP-1431:
----------------------------------
Attachment: HADOOP-1431_1_20070525.patch
Here is a reasonably straight-forward to address the concerns raised by this
patch - basically I have implemented a ReportingComparator which sends a
progress update every 100 comparisions and this comparator is used for
sorting/merging in both MapTask & ReduceTask.
The idea is that the 'compare' operation is a metric independent of the actual
sorting/merging algorithm and hence a good indicator of the 'progress' being
made by the sort/merge done by the framework in map/reduce task...
I have adopted a policy similar to the one already employed in MapTask where
the RecordReader sends progress updates depending on the amount of bytes
consumed from the input file i.e. the ReportingComparator wraps a comparator
and a reporter object and sends an update every 100 comparisions. The advantage
is that the sort algorithm (which could be user-code i.e. by extending
BasicTypeSorterBase) is blissfully un-aware of the reporting going on under the
covers and also it ensures that there is no way even user-supplied comparators
(e.g. JobConf.getOutputValueGroupingComparator()) can by-pass this reporting
mechanism).
Appreciate review/feedback while I continue testing... I know Devaraj has some.
*smile*
> Map tasks can't timeout for failing to call progress
> ----------------------------------------------------
>
> Key: HADOOP-1431
> URL: https://issues.apache.org/jira/browse/HADOOP-1431
> Project: Hadoop
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.13.0
> Reporter: Owen O'Malley
> Assigned To: Arun C Murthy
> Fix For: 0.13.0
>
> Attachments: HADOOP-1431_1_20070525.patch
>
>
> Currently the map task runner creates a thread that calls progress every
> second to keep the system from killing the map if the sort takes too long.
> This is the wrong approach, because it will cause stuck tasks to not be
> killed. The right solution is to have the sort call progress as it actually
> makes progress. This is part of what is going on in HADOOP-1374. A map gets
> stuck at 100% progress, but not done.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.