[ 
https://issues.apache.org/jira/browse/HADOOP-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12610719#action_12610719
 ] 

Alejandro Abdelnur commented on HADOOP-3150:
--------------------------------------------

There are a few different topics being discussed in this issue:

# Changing from JT to Task the responsibility for committing the output of a 
task
# Making the committing of the output of a task generic, non HDFS specific
# Being able to create side OutputStreams (not RecordWriters) from a task

IMO this issue should only address the *first topic*. The gain of this is 
freeing the JT from doing the task output commit, leaving to the JT just the 
coordination of it.

The *third topic*, as it has been suggested it could be address by Hadoop-3149, 
by adding an static method {{getOutputStream(JobConf conf, String baseName)}}. 
This method would use the filename namespacing introduced by Hadoop-3149 
(previously Hadoop-3258) to create a unique file under the job working output 
directory. Note that {{MultipleOutputs}} does not implement {{OutputFormat}}, 
because of this, IMO, we are not overloading it with unrelated behavior; 
{{MultipleOutputs}} just becomes a mean to create additional outputs, 
{{OutputFormat}}s or {{OutputStream}}s in the context of the output of a task 
consistent with the handling of the task output in the case of success 
completion and failure.

The *second topic* is a whole thing on it own and I think it should be left to 
its own Jira:

# It should make the commit of a task output independent of HDFS
# It should handle the commit of a task output atomically (at least against 
every single storage the outputs go)
# It should not leave the commit to the {{OutputFormat}} as jobs can use their 
own output formats, IMO it should be something like {{TaskOutputCommitter}} for 
each storage type that is part of the Hadoop code (cannot be set by a job) and 
is run once per storage instance used by the task (ideally in a transaction 
like style).


> Move task file promotion into the task
> --------------------------------------
>
>                 Key: HADOOP-3150
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3150
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Owen O'Malley
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.19.0
>
>         Attachments: 3150.patch
>
>
> We need to move the task file promotion from the JobTracker to the Task and 
> move it down into the output format.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to