[
https://issues.apache.org/jira/browse/HADOOP-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12614686#action_12614686
]
Alejandro Abdelnur commented on HADOOP-3150:
--------------------------------------------
A couple of comments on the latest patch (JUL/17):
*1 {{cleanupTask}} method*
The {{OutputFormat.cleanupTask(TaskAttemptContext taskContext, boolean
promote)}} method name and what the method is supposed to do do not match. It
is not intuitive. I would suggest having 2 methods
{{commitTask(TaskAttemptContext ctx)}} and {{discardTask(TaskAttemptContext
ctx)}} instead, then it is clear what is the intention when looking at the
methods and their usage (a boolean to do exactly the opposite is confusing).
*2 New methods in the {{OutputFormat}}*
Instead adding the following 4 to the {{OutputFormat}}:
{code:java}
+ public abstract void setupJob(JobContext context) throws IOException;
+ public abstract void cleanupJob(JobContext context) throws IOException;
+ public abstract void setupTask(TaskAttemptContext taskContext) throws
IOException;
+ public abstract void cleanupTask(TaskAttemptContext taskContext, boolean
promote) throws IOException;
{code}
I would put them in a separate abstract class {{OutputCommitter}} and add to
the {{OutputFormat}} a single abstract method {{OutputCommitter
getOutputCommitter()}}.
The {{FileOutputFormat}} would implement the {{getOutputCommitter()}} method
returning a {{FileOutputCommitter}}.
The {{Task.done(Umbilical)}} when taking care of the side files would
instantiate a {{FileOutputCommitter}} (taking the class from a config property)
and do the commit for side files.
The pros with approach are:
* It gets rid of the static commit method if {{FileOutputFormat}} for the
special case of side files.
* It makes commit of side files pluggable as well.
* It reduces the methods in the {{OutputFormat}} to what are relevant to
output, handling the commit as a separate concern.
* It will make less prone to errors for developers creating their own
{{OutputFormat}} implementations as it is more clear the separation of concerns.
* The code in {{Task}} will be simpler.
> Move task file promotion into the task
> --------------------------------------
>
> Key: HADOOP-3150
> URL: https://issues.apache.org/jira/browse/HADOOP-3150
> Project: Hadoop Core
> Issue Type: Improvement
> Components: mapred
> Reporter: Owen O'Malley
> Assignee: Amareshwari Sriramadasu
> Fix For: 0.19.0
>
> Attachments: 3150.patch, patch-3150.txt, patch-3150.txt
>
>
> We need to move the task file promotion from the JobTracker to the Task and
> move it down into the output format.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.