[
https://issues.apache.org/jira/browse/HADOOP-13145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chris Nauroth updated HADOOP-13145:
-----------------------------------
Attachment: HADOOP-13145-branch-2.004.patch
I'm attaching patch v004.
* Removed redundant single-file tests and small multi-file tests.
* Introduced {{scale.test.distcp.file.size.kb}} configuration property for
tuning test file sizes. The default is 10 MB.
* Set multi-part configuration properties to 8 MB, so with the default 10 MB
file size, the tests will cover multi-part upload.
With this version of the patch, the S3A test runs in ~55 seconds for me, and
the WASB test runs in ~65 seconds. I completed a full parallel-test run
against S3 buckets in US-west-2.
> In DistCp, prevent unnecessary getFileStatus call when not preserving
> metadata.
> -------------------------------------------------------------------------------
>
> Key: HADOOP-13145
> URL: https://issues.apache.org/jira/browse/HADOOP-13145
> Project: Hadoop Common
> Issue Type: Improvement
> Components: tools/distcp
> Reporter: Chris Nauroth
> Assignee: Chris Nauroth
> Attachments: HADOOP-13145-branch-2.004.patch, HADOOP-13145.001.patch,
> HADOOP-13145.003.patch
>
>
> After DistCp copies a file, it calls {{getFileStatus}} to get the
> {{FileStatus}} from the destination so that it can compare to the source and
> update metadata if necessary. If the DistCp command was run without the
> option to preserve metadata attributes, then this additional
> {{getFileStatus}} call is wasteful.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]