[
https://issues.apache.org/jira/browse/HADOOP-13145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15290800#comment-15290800
]
Steve Loughran commented on HADOOP-13145:
-----------------------------------------
There's no S3 service in my country, I need to test against a datacentre in a
country with a lower tax regime yet still under EU data protection legislation
coverage. Ireland; I could benchmark Frankfurt.
If you think the large files repeat the same coverage as the smaller ones, yes,
please unify. Even so, I'd like it to be configurable so that I could set up
test runs with smaller datasets —and we have the option of test runs with
larger files.
For those test, it'd be nice if the S3A setup explicitly turned the multipart
threshold down (8MB?) and the same for partition sizes, so that it'd test the
multipart code path and distcp
> In DistCp, prevent unnecessary getFileStatus call when not preserving
> metadata.
> -------------------------------------------------------------------------------
>
> Key: HADOOP-13145
> URL: https://issues.apache.org/jira/browse/HADOOP-13145
> Project: Hadoop Common
> Issue Type: Improvement
> Components: tools/distcp
> Reporter: Chris Nauroth
> Assignee: Chris Nauroth
> Attachments: HADOOP-13145.001.patch, HADOOP-13145.003.patch
>
>
> After DistCp copies a file, it calls {{getFileStatus}} to get the
> {{FileStatus}} from the destination so that it can compare to the source and
> update metadata if necessary. If the DistCp command was run without the
> option to preserve metadata attributes, then this additional
> {{getFileStatus}} call is wasteful.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]