[ 
https://issues.apache.org/jira/browse/HADOOP-13145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15289674#comment-15289674
 ] 

Steve Loughran commented on HADOOP-13145:
-----------------------------------------

tested -003 against s3 ireland and azure.

{code}
Running org.apache.hadoop.fs.contract.s3a.TestS3AContractDistCp
Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 223.843 sec - 
in org.apache.hadoop.fs.contract.s3a.TestS3AContractDistCp

...
Running org.apache.hadoop.fs.azure.contract.TestAzureNativeContractDistCp
Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 65.354 sec - in 
org.apache.hadoop.fs.azure.contract.TestAzureNativeContractDistCp

{code}

Interesting how much faster azure is. 

The patch, is, as it stands, it's going to add 4 min to a TestS3A* test 
pattern. Could it be made one of the scaleable tests where it takes a config of 
option on scale so can be made configurable? There are already some tests which 
use {{scale.test.operation.count}} to control scale; we could have one on 
distcp file size, with the large file size being driven by it. Make it 
something in KB and it could easily be tuned for those of us in a different 
country from an S3 endpoint.

> In DistCp, prevent unnecessary getFileStatus call when not preserving 
> metadata.
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-13145
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13145
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: tools/distcp
>            Reporter: Chris Nauroth
>            Assignee: Chris Nauroth
>         Attachments: HADOOP-13145.001.patch, HADOOP-13145.003.patch
>
>
> After DistCp copies a file, it calls {{getFileStatus}} to get the 
> {{FileStatus}} from the destination so that it can compare to the source and 
> update metadata if necessary.  If the DistCp command was run without the 
> option to preserve metadata attributes, then this additional 
> {{getFileStatus}} call is wasteful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to