[jira] Updated: (HADOOP-1431) Map tasks can't timeout for failing to call progress

Arun C Murthy (JIRA) Fri, 25 May 2007 09:10:37 -0700

     [ 
https://issues.apache.org/jira/browse/HADOOP-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Arun C Murthy updated HADOOP-1431:
----------------------------------

    Attachment: HADOOP-1431_1_20070525.patch

Here is a reasonably straight-forward to address the concerns raised by this 
patch - basically I have implemented a ReportingComparator which sends a 
progress update every 100 comparisions and this comparator is used for 
sorting/merging in both MapTask & ReduceTask.

The idea is that the 'compare' operation is a metric independent of the actual 
sorting/merging algorithm and hence a good indicator of the 'progress' being 
made by the sort/merge done by the framework in map/reduce task... 

I have adopted a policy similar to the one already employed in MapTask where 
the RecordReader sends progress updates depending on the amount of bytes 
consumed from the input file i.e. the ReportingComparator wraps a comparator 
and a reporter object and sends an update every 100 comparisions. The advantage 
is that the sort algorithm (which could be user-code i.e. by extending 
BasicTypeSorterBase) is blissfully un-aware of the reporting going on under the 
covers and also it ensures that there is no way even user-supplied comparators 
(e.g. JobConf.getOutputValueGroupingComparator()) can by-pass this reporting 
mechanism).

Appreciate review/feedback while I continue testing... I know Devaraj has some. 
*smile*

> Map tasks can't timeout for failing to call progress
> ----------------------------------------------------
>
>                 Key: HADOOP-1431
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1431
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.13.0
>            Reporter: Owen O'Malley
>         Assigned To: Arun C Murthy
>             Fix For: 0.13.0
>
>         Attachments: HADOOP-1431_1_20070525.patch
>
>
> Currently the map task runner creates a thread that calls progress every 
> second to keep the system from killing the map if the sort takes too long. 
> This is the wrong approach, because it will cause stuck tasks to not be 
> killed. The right solution is to have the sort call progress as it actually 
> makes progress. This is part of what is going on in HADOOP-1374. A map gets 
> stuck at 100% progress, but not done.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1431) Map tasks can't timeout for failing to call progress

Reply via email to