Re: [jira] Updated: (HADOOP-76) Implement speculative re-execution of reduces

Arkady Borkovsky Thu, 19 Oct 2006 18:47:15 -0700

Just a question:

did we consider "speculative execution of reduces with sub-splitting"?

That is, for each of the last / slow reduce tasks run several tasksubslitting the input of the original task?The implementation would be probably more complicated, but the benefitsmay be huge -- currently, it is not uncommon to see the single lastreduce task running for hours (if not for days).


-- ab

On Oct 19, 2006, at 10:43 AM, Sanjay Dahiya (JIRA) wrote:

     [ http://issues.apache.org/jira/browse/HADOOP-76?page=all ]

Sanjay Dahiya updated HADOOP-76:
--------------------------------

    Attachment: Hadoop-76.patch

This patch is up for review.

Here is the list of changes included in this patch -
Replaced recentTasks to a Map, added a new method in TaskInProgresshasRanOnMachine, which looks at this Map and hasFailedOnMachines().This is used to avoid scheduling multiple reduce instances of sametask on the same node.
Added a PhasedRecordWriter, which takes a RecordWriter, tempName,finalName. Another option was to create a PhasedOutputFormat, thisseems more natural as it works with any existing OutputFormat andRecordWriter. Records are written to tempName and when commit iscalled they are moved to finalName.
ReduceTask.run() - if speculative execution is enabled then reduceoutput is written to a temp location using PhasedRecordWriter. Aftertask finishes the output is written to a final location.If some other speculative instance finishes first thenTaskInProgress.shouldCloseForClosedJob() returns true for the taskId.On TaskTracker the task is killed by Process.destroy() so cleanup codeis in TaskTracker instead of Task. The cleanup of Maps happen in Conf,which is probably misplaced. We could refactor this part for both Mapand Reduce and move cleanup code to some utility classes which, givena Map and Reduce task track the files generated and cleanup if needed.
Added an extra attribute in TaskInProgress - runningSpeculative, toavoid running more than one speculative instances of ReduceTask. Toomany Reduce instances for same task could increase load on Mapmachines, this needs discussion. I can revert this change back toallow some other number of instances of Reduces (MAX_TASK_FAILURES?).
comments
Implement speculative re-execution of reduces
---------------------------------------------

                Key: HADOOP-76
                URL: http://issues.apache.org/jira/browse/HADOOP-76
            Project: Hadoop
         Issue Type: Improvement
         Components: mapred
   Affects Versions: 0.1.0
           Reporter: Doug Cutting
        Assigned To: Sanjay Dahiya
           Priority: Minor
        Attachments: Hadoop-76.patch, spec_reducev.patch
As a first step, reduce task outputs should go to temporary fileswhich are renamed when the task completes.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of theadministrators:http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:http://www.atlassian.com/software/jira

Re: [jira] Updated: (HADOOP-76) Implement speculative re-execution of reduces

Reply via email to