[ http://issues.apache.org/jira/browse/HADOOP-76?page=all ]

Johan Oskarson updated HADOOP-76:
---------------------------------

    Attachment: spec_reducev.patch

I've tried to implement speculative reduces and it seems to be working, however 
I'd like you to take a look at it since I'm not familiar with some of the inner 
workings of hadoop.

As suggested it writes output to a temporary name and the first one to finish 
moves it to the correct output name.
The patch adds a String tmpName to getRecordWriter in OutputFormatBase
and a close method. Basically the OutputFormatBase keeps track of the tmpName 
and the final name
once close is called it moves the tmp to the final.

This means the current output formats doesn't have to be changed.

This patch would ideally be complemented by a better tasktracker selection, 
I've seen instances where there's two final reduce tips and then a speculative 
reduce is assigned to the same node that is already running the other task.

A speculative reduce will be started if finishedReduces / numReduceTasks >= 0.7

That's about it, looking forward to hear your input

> Implement speculative re-execution of reduces
> ---------------------------------------------
>
>          Key: HADOOP-76
>          URL: http://issues.apache.org/jira/browse/HADOOP-76
>      Project: Hadoop
>         Type: Improvement

>   Components: mapred
>     Versions: 0.1.0
>     Reporter: Doug Cutting
>     Assignee: Owen O'Malley
>     Priority: Minor
>      Fix For: 0.5.0
>  Attachments: spec_reducev.patch
>
> As a first step, reduce task outputs should go to temporary files which are 
> renamed when the task completes.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply via email to