[
https://issues.apache.org/jira/browse/HADOOP-11525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299099#comment-14299099
]
Lei (Eddy) Xu commented on HADOOP-11525:
----------------------------------------
[~thodemoor] and [~cnauroth] Thank you so much for your reviews and inputs!
My intention of this patch was providing a more generic framework to enable the
applications (e.g., MR, FsShell and much more) to be able to tune the
performance without drag in the dependencies of the concrete {{FileSystems}},
so that we can avoid code like the following in {{hadoop-common}}.
{code}
if (fs instanceof DistributedFileSystem) {
...
} else if (fs instanceof S3AFileSystem || fs instanceof NativeAzureFileSystem) {
...
}
{code}
It should definitely provide more flags (e.g.,
{{Characteristics#isRenameExpensive()}} and more). It would be great if I can
get more inputs on what flags we should offer. Additionally, the default
value(s) of {{Characteristics}} is set by assuming that the {{FileSystem}} is
{{DistributedFileSystem}}, so that for the current code base, applications can
still work _correctly_, but not necessarily _optimized_.
[~cnauroth] You are right. For this particular case ( {{copyStreamToTarget}}),
it is better to put this "transactional write" semantic into {{FileSystem}} to
reduce the burden of applications.
[~thodemoor] and [~cnauroth] Do you think the {{Characteristics}} approach has
benefits beyond this "transactional write"? Is it worth to pursue further?
Looking forward to get inputs from you.
> FileSystem should expose some performance characteristics for caller (e.g.,
> FsShell) to choose the right algorithm.
> -------------------------------------------------------------------------------------------------------------------
>
> Key: HADOOP-11525
> URL: https://issues.apache.org/jira/browse/HADOOP-11525
> Project: Hadoop Common
> Issue Type: Improvement
> Components: tools
> Affects Versions: 2.6.0
> Reporter: Lei (Eddy) Xu
> Assignee: Lei (Eddy) Xu
> Attachments: HADOOP-11525.000.patch
>
>
> When running {{hadoop fs -put}}, {{FsShell}} creates a {{._COPYING_.}} file
> on the target directory, and then renames it to target file when the write is
> done. However, for some targeted systems, such as S3, Azure and Swift, a
> partial failure write request (i.e., {{PUT}}) has not side effect, while the
> {{rename}} operation is expensive.
> {{FileSystem}} should expose some characteristics so that the operation such
> as {{CommandWithDestination#copyStreamToTarget()}} can detect and choose the
> right way to do.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)