[jira] Updated: (HADOOP-76) Implement speculative re-execution of reduces

Sanjay Dahiya (JIRA) Thu, 19 Oct 2006 10:44:33 -0700

     [ http://issues.apache.org/jira/browse/HADOOP-76?page=all ]


Sanjay Dahiya updated HADOOP-76:
--------------------------------

    Attachment: Hadoop-76.patch

This patch is up for review. 

Here is the list of changes included in this patch - 

Replaced recentTasks to a Map, added a new method in TaskInProgress 
hasRanOnMachine, which looks at this Map and hasFailedOnMachines(). This is 
used to avoid scheduling multiple reduce instances of same task on the same 
node. 

Added a PhasedRecordWriter, which takes a RecordWriter, tempName, finalName. 
Another option was to create a PhasedOutputFormat, this seems more natural as 
it works with any existing OutputFormat and RecordWriter. Records are written 
to tempName and when commit is called they are moved to finalName. 

ReduceTask.run() - if speculative execution is enabled then reduce output is 
written to a temp location using PhasedRecordWriter. After task finishes the 
output is written to a final location. 
If some other speculative instance finishes first then 
TaskInProgress.shouldCloseForClosedJob() returns true for the taskId. On 
TaskTracker the task is killed by Process.destroy() so cleanup code is in 
TaskTracker instead of Task. The cleanup of Maps happen in Conf, which is 
probably misplaced. We could refactor this part for both Map and Reduce and 
move cleanup code to some utility classes which, given a Map and Reduce task 
track the files generated and cleanup if needed. 

Added an extra attribute in TaskInProgress - runningSpeculative, to avoid 
running more than one speculative instances of ReduceTask. Too many Reduce 
instances for same task could increase load on Map machines, this needs 
discussion. I can revert this change back to allow some other number of 
instances of Reduces (MAX_TASK_FAILURES?). 

comments

> Implement speculative re-execution of reduces
> ---------------------------------------------
>
>                 Key: HADOOP-76
>                 URL: http://issues.apache.org/jira/browse/HADOOP-76
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.1.0
>            Reporter: Doug Cutting
>         Assigned To: Sanjay Dahiya
>            Priority: Minor
>         Attachments: Hadoop-76.patch, spec_reducev.patch
>
>
> As a first step, reduce task outputs should go to temporary files which are 
> renamed when the task completes.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (HADOOP-76) Implement speculative re-execution of reduces

Reply via email to