[
https://issues.apache.org/jira/browse/HADOOP-11525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14298406#comment-14298406
]
Thomas Demoor commented on HADOOP-11525:
----------------------------------------
I agree that this would be very nice functionality to have. But I feel that if
we adopt it (which I think we should) it should be used throughout the entire
codebase, and not only in FsShell. As "committing by renaming" is the default
in HDFS, there are more code-paths that could benefit from this functionality.
The mapreduce committers immediately come to mind:
org.apache.hadoop.mapreduce.lib.output, f.i. FileOutputCommitter is currently
often replaced by a custom Direct(File)OutputCommitter (EMR, S3 users "in the
know", MapR, ...) which could be obviated by this functionality. Probably there
are more?
> FileSystem should expose some performance characteristics for caller (e.g.,
> FsShell) to choose the right algorithm.
> -------------------------------------------------------------------------------------------------------------------
>
> Key: HADOOP-11525
> URL: https://issues.apache.org/jira/browse/HADOOP-11525
> Project: Hadoop Common
> Issue Type: Improvement
> Components: tools
> Affects Versions: 2.6.0
> Reporter: Lei (Eddy) Xu
> Assignee: Lei (Eddy) Xu
> Attachments: HADOOP-11525.000.patch
>
>
> When running {{hadoop fs -put}}, {{FsShell}} creates a {{._COPYING_.}} file
> on the target directory, and then renames it to target file when the write is
> done. However, for some targeted systems, such as S3, Azure and Swift, a
> partial failure write request (i.e., {{PUT}}) has not side effect, while the
> {{rename}} operation is expensive.
> {{FileSystem}} should expose some characteristics so that the operation such
> as {{CommandWithDestination#copyStreamToTarget()}} can detect and choose the
> right way to do.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)