[ 
https://issues.apache.org/jira/browse/MAPREDUCE-460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved MAPREDUCE-460.
-------------------------------

    Resolution: Not A Problem

This has gone stale, closing out. We can discuss how best to solve this in a 
new ticket now that MR2 is out.
                
> Should be able to re-run jobs, collecting only missing output
> -------------------------------------------------------------
>
>                 Key: MAPREDUCE-460
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-460
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Bryan Pendleton
>            Assignee: Owen O'Malley
>            Priority: Minor
>
> For jobs with no side effects (roughly == jobs with speculative execution 
> enabled), if partial output has been generated, it should be possible to 
> re-run the job, and fill in the missing pieces. I have now run the same job 
> twice, once finishing 42 of 44 reduce tasks, another time finishing only 17. 
> Each time, many nodes have failed, causing many many tasks to fail ( in one 
> case, 5k failures from 15k map tasks, 23 failures from 44 reduces), but some 
> valid output was generated. Since the output is only dependent on the input, 
> and both jobs used the same input, I will now be able to combine these two 
> failed task outputs to get a completed job's output. This should be something 
> that can be more automatic.
> In particular, it should be possible to resubmit a job, with a list of 
> partitions that should be ignored. A special Combiner, or pre-Combiner, would 
> throw out any map output for partitions that have already been successfully 
> completed, thus reducing the amount of data that needs to be reduced to 
> complete the job. It would, of course, be nice to support "filling in" 
> existing outputs, rather than having to do a move operation on completed 
> outputs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to