[ 
https://issues.apache.org/jira/browse/HADOOP-11525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14299099#comment-14299099
 ] 

Lei (Eddy) Xu commented on HADOOP-11525:
----------------------------------------

[~thodemoor] and [~cnauroth] Thank you so much for your reviews and inputs!

My intention of this patch was providing a more generic framework to enable the 
applications (e.g., MR, FsShell and much more) to be able to tune the 
performance without drag in the dependencies of the concrete {{FileSystems}}, 
so that we can avoid code like the following in {{hadoop-common}}.

{code}
if (fs instanceof DistributedFileSystem) {
  ...
} else if (fs instanceof S3AFileSystem || fs instanceof NativeAzureFileSystem) {
...
}
{code}

It should definitely provide more flags (e.g., 
{{Characteristics#isRenameExpensive()}}  and more). It would be great if I can 
get more inputs on what flags we should offer. Additionally, the default 
value(s) of {{Characteristics}} is set by assuming that the {{FileSystem}} is 
{{DistributedFileSystem}},  so that for the current code base, applications can 
still work _correctly_, but not necessarily _optimized_. 

[~cnauroth] You are right. For this particular case ( {{copyStreamToTarget}}), 
it is better to put this "transactional write" semantic into {{FileSystem}} to 
reduce the burden of applications. 

[~thodemoor] and [~cnauroth] Do you think the {{Characteristics}} approach has 
benefits beyond this "transactional write"? Is it worth to pursue further?

Looking forward to get inputs from you.

> FileSystem should expose some performance characteristics for caller (e.g., 
> FsShell) to choose the right algorithm.
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-11525
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11525
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: tools
>    Affects Versions: 2.6.0
>            Reporter: Lei (Eddy) Xu
>            Assignee: Lei (Eddy) Xu
>         Attachments: HADOOP-11525.000.patch
>
>
> When running {{hadoop fs -put}},  {{FsShell}} creates a {{._COPYING_.}} file 
> on the target directory, and then renames it to target file when the write is 
> done. However, for some targeted systems, such as S3, Azure and Swift, a 
> partial failure write request (i.e., {{PUT}}) has not side effect, while the 
> {{rename}} operation is expensive. 
> {{FileSystem}} should expose some characteristics so that the operation such 
> as {{CommandWithDestination#copyStreamToTarget()}} can detect and choose the 
> right way to do.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to