[
https://issues.apache.org/jira/browse/HADOOP-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12613962#action_12613962
]
Alejandro Abdelnur commented on HADOOP-3150:
--------------------------------------------
My thoughts are along the lines of my first comment on this issue.
Committing an output is not dependent on the {{OutputFormat}} type but the
storage. Putting it in {{OutputFormat}} even if done at the base class
{{FileOutputFormat}} still implies it belongs to the {{OutputFormat}}.
If a job has multiple outputs (via {{MultipleFileOutputFormat}},
{{MultipleOutputs}} or side files) against the same storage it would seem
logical to enforce the same commit logic and this should be atomic.
Ideally, if a job has multiple outputs (to different storages) this should also
be atomic.
Even if the storage access is provided by the job (ie a custom FileSystem) it
should be some entity close to the storage who providing the output commit
logic.
Granted some or all of the above comments are far fetched or may be seem
unrealistic. So I would (I am) living with them, but from the point of view of
concerns correctness I would see a separate interface {{OutputCommitter}}
providing the commit logic even it it does exactly what the patch does today
with none of the above comments. Maybe having a new method in the
{{OutputFormat}} that gives the {{OutputCommitter}} class to use.
> Move task file promotion into the task
> --------------------------------------
>
> Key: HADOOP-3150
> URL: https://issues.apache.org/jira/browse/HADOOP-3150
> Project: Hadoop Core
> Issue Type: Improvement
> Components: mapred
> Reporter: Owen O'Malley
> Assignee: Amareshwari Sriramadasu
> Fix For: 0.19.0
>
> Attachments: 3150.patch, patch-3150.txt
>
>
> We need to move the task file promotion from the JobTracker to the Task and
> move it down into the output format.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.