[ https://issues.apache.org/jira/browse/HADOOP-1127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Arun C Murthy updated HADOOP-1127: ---------------------------------- Status: Patch Available (was: Open) > Speculative Execution and output of Reduce tasks > ------------------------------------------------ > > Key: HADOOP-1127 > URL: https://issues.apache.org/jira/browse/HADOOP-1127 > Project: Hadoop > Issue Type: Improvement > Components: mapred > Affects Versions: 0.12.0 > Reporter: Arun C Murthy > Assigned To: Arun C Murthy > Fix For: 0.13.0 > > Attachments: HADOOP-1127_20070328_1.patch, > HADOOP-1127_20070331_2.patch, HADOOP-1127_20070402_3.patch, > HADOOP-1127_20070403_4.patch > > > We've recently seen instances where jobs run with 'speculative execution' > tend to be quite unstable and fail with *AlreadyBeingCreatedException* > noticed at the NameNode. Also potentially we could have hairy situations > where a failed Reduce tasks's output could clash with a successful task's > (same tip) output. > As it exists, speculative execution relies on the PhasedFileSystem which > creates a temp output file and then on task-completion that file is 'moved' > to its final position via a call to PhasedFileSystem.commit from > ReduceTask.run(). This has lead to issues such as the above. > Proposal: > Basically the idea is to due this uniformly for all Reduce tasks i.e. all > reducers create temp files and then have a serialized 'commit' done by the > JobTracker which moves the temp file to it's final position. > We create the temp file in the job's output directory itself: > <output_dir>/_<taskid> (emphasis on the leading '_') > On task completion we'll add that temp file's path to the TaskStatus and then > the JobTracker moves that file to it's final position. > Thoughts? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.