cool. We'll give it a look.
On Jun 30, 2006, at 8:33 AM, Johan Oskarson (JIRA) wrote:
[ http://issues.apache.org/jira/browse/HADOOP-76?page=all ]
Johan Oskarson updated HADOOP-76:
---------------------------------
Attachment: spec_reducev.patch
I've tried to implement speculative reduces and it seems to be
working, however I'd like you to take a look at it since I'm not
familiar with some of the inner workings of hadoop.
As suggested it writes output to a temporary name and the first one
to finish moves it to the correct output name.
The patch adds a String tmpName to getRecordWriter in OutputFormatBase
and a close method. Basically the OutputFormatBase keeps track of
the tmpName and the final name
once close is called it moves the tmp to the final.
This means the current output formats doesn't have to be changed.
This patch would ideally be complemented by a better tasktracker
selection, I've seen instances where there's two final reduce tips
and then a speculative reduce is assigned to the same node that is
already running the other task.
A speculative reduce will be started if finishedReduces /
numReduceTasks >= 0.7
That's about it, looking forward to hear your input
Implement speculative re-execution of reduces
---------------------------------------------
Key: HADOOP-76
URL: http://issues.apache.org/jira/browse/HADOOP-76
Project: Hadoop
Type: Improvement
Components: mapred
Versions: 0.1.0
Reporter: Doug Cutting
Assignee: Owen O'Malley
Priority: Minor
Fix For: 0.5.0
Attachments: spec_reducev.patch
As a first step, reduce task outputs should go to temporary files
which are renamed when the task completes.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the
administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira