speculative reduce should touch output files only through OutputFormat
----------------------------------------------------------------------
Key: HADOOP-1416
URL: https://issues.apache.org/jira/browse/HADOOP-1416
Project: Hadoop
Issue Type: Bug
Components: mapred
Reporter: Doug Cutting
HADOOP-1127 introduced speculative reduce. This was implemented by having the
MapReduce kernel directly manipulate a job's output files. This is
inconsistent with the architecture of the InputFormat and OutputFormat
interfaces. The kernel should never directly operate on job input or output,
always instead deferring to these interfaces.
To correct this, we will need to add some new methods to OutputFormat,
something like:
/** rename output generated by getRecordWriter(job, tempName) */
void completeOutput(JobConf job, String tempName, String finalName);
/** cleanup output generated by getRecordWriter(job, tempName). called for
unused outputs. */
void cleanupOutput(JobConf job, String tempName);
These should be implemented in OutputFormatBase, which should be renamed
FileOutputFormat.
To prevent this happening again, we should also move JobConf#getInputPath(),
#setInputPath(), #getOutputPath(), and #setOutputPath() to static methods on
FileInputFormat and FileOutputFormat, since these methods are specific to jobs
with file inputs and outputs (and not, e.g., HBase tables).
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.