[
https://issues.apache.org/jira/browse/HADOOP-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12610719#action_12610719
]
Alejandro Abdelnur commented on HADOOP-3150:
--------------------------------------------
There are a few different topics being discussed in this issue:
# Changing from JT to Task the responsibility for committing the output of a
task
# Making the committing of the output of a task generic, non HDFS specific
# Being able to create side OutputStreams (not RecordWriters) from a task
IMO this issue should only address the *first topic*. The gain of this is
freeing the JT from doing the task output commit, leaving to the JT just the
coordination of it.
The *third topic*, as it has been suggested it could be address by Hadoop-3149,
by adding an static method {{getOutputStream(JobConf conf, String baseName)}}.
This method would use the filename namespacing introduced by Hadoop-3149
(previously Hadoop-3258) to create a unique file under the job working output
directory. Note that {{MultipleOutputs}} does not implement {{OutputFormat}},
because of this, IMO, we are not overloading it with unrelated behavior;
{{MultipleOutputs}} just becomes a mean to create additional outputs,
{{OutputFormat}}s or {{OutputStream}}s in the context of the output of a task
consistent with the handling of the task output in the case of success
completion and failure.
The *second topic* is a whole thing on it own and I think it should be left to
its own Jira:
# It should make the commit of a task output independent of HDFS
# It should handle the commit of a task output atomically (at least against
every single storage the outputs go)
# It should not leave the commit to the {{OutputFormat}} as jobs can use their
own output formats, IMO it should be something like {{TaskOutputCommitter}} for
each storage type that is part of the Hadoop code (cannot be set by a job) and
is run once per storage instance used by the task (ideally in a transaction
like style).
> Move task file promotion into the task
> --------------------------------------
>
> Key: HADOOP-3150
> URL: https://issues.apache.org/jira/browse/HADOOP-3150
> Project: Hadoop Core
> Issue Type: Improvement
> Components: mapred
> Reporter: Owen O'Malley
> Assignee: Amareshwari Sriramadasu
> Fix For: 0.19.0
>
> Attachments: 3150.patch
>
>
> We need to move the task file promotion from the JobTracker to the Task and
> move it down into the output format.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.