[ 
https://issues.apache.org/jira/browse/HADOOP-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12614034#action_12614034
 ] 

Devaraj Das commented on HADOOP-3150:
-------------------------------------

Alejandro, you can provide different implementations of the various output 
storage related APIs (including commit), and yet use the same recordwriter. The 
vice-versa is also true - you could use the same storage and have different 
recordwriters. In essence, the OutputFormat captures both the storage and the 
recordwriter to use on that storage. All outputformats against the same storage 
could use the same commit implementation (the base one) but is there any reason 
why we should enforce that. 

The atomicity, I think, is output format dependent. So for e.g., in the case of 
FileOutputFormat, assuming maps/reduces are idempotent, if the commit happened 
partially (1 out of 10 files got promoted), the next attempt of the same task 
could then either ignore the previously committed file (delete before commit) 
or accept what has already been committed. Maybe for the dfs, if we provide an 
atomic rename(tmpdir/<set-of-files>, outputdir) it would handle atomicity. For 
an outputformat like the SQLOutputFormat, it is a different game. My point is 
the atomicity across multiple outputs across different storages is hard to 
guarantee and it should be a per outputformat dependent logic.

> Move task file promotion into the task
> --------------------------------------
>
>                 Key: HADOOP-3150
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3150
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Owen O'Malley
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.19.0
>
>         Attachments: 3150.patch, patch-3150.txt
>
>
> We need to move the task file promotion from the JobTracker to the Task and 
> move it down into the output format.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to