[ 
https://issues.apache.org/jira/browse/HADOOP-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12614686#action_12614686
 ] 

Alejandro Abdelnur commented on HADOOP-3150:
--------------------------------------------

A couple of comments on the latest patch (JUL/17):

*1 {{cleanupTask}} method*

The {{OutputFormat.cleanupTask(TaskAttemptContext taskContext, boolean 
promote)}} method name and what the method is supposed to do do not match. It 
is not intuitive. I would suggest having 2 methods 
{{commitTask(TaskAttemptContext ctx)}} and {{discardTask(TaskAttemptContext 
ctx)}} instead, then it is clear what is the intention when looking at the 
methods and their usage (a boolean to do exactly the opposite is confusing).

*2 New methods in the {{OutputFormat}}*

Instead adding the following 4 to the {{OutputFormat}}:

{code:java}
+  public abstract void setupJob(JobContext context) throws IOException;
+  public abstract void cleanupJob(JobContext context) throws IOException;
+  public abstract void setupTask(TaskAttemptContext taskContext)  throws 
IOException;
+  public abstract void cleanupTask(TaskAttemptContext taskContext, boolean 
promote) throws IOException;
{code}

I would put them in a separate abstract class {{OutputCommitter}} and add to 
the {{OutputFormat}} a single abstract method {{OutputCommitter 
getOutputCommitter()}}. 

The {{FileOutputFormat}} would implement the {{getOutputCommitter()}} method 
returning a {{FileOutputCommitter}}.

The {{Task.done(Umbilical)}} when taking care of the side files would 
instantiate a {{FileOutputCommitter}} (taking the class from a config property) 
and do the commit for side files.

The pros with approach are:

* It gets rid of the static commit method if {{FileOutputFormat}} for the 
special case of side files.
* It makes commit of side files pluggable as well.
* It reduces the methods in the {{OutputFormat}} to what are relevant to 
output, handling the commit as a separate concern.
* It will make less prone to errors for developers creating their own 
{{OutputFormat}} implementations as it is more clear the separation of concerns.
* The code in {{Task}} will be simpler.




> Move task file promotion into the task
> --------------------------------------
>
>                 Key: HADOOP-3150
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3150
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Owen O'Malley
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.19.0
>
>         Attachments: 3150.patch, patch-3150.txt, patch-3150.txt
>
>
> We need to move the task file promotion from the JobTracker to the Task and 
> move it down into the output format.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to