[
https://issues.apache.org/jira/browse/HADOOP-13145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chris Nauroth updated HADOOP-13145:
-----------------------------------
Attachment: HADOOP-13145.001.patch
The attached v001 patch avoids the unnecessary {{getFileStatus}} call.
The effect is particularly pronounced when running DistCp with a destination on
S3A, where eventual consistency on S3 can cause the {{getFileStatus}} call to
fail with {{FileNotFoundException}}. Then, the whole MapReduce task fails,
retries, and repeats copying all the data. [~rajesh.balamohan], I know you saw
this with some recent large copies to S3A. Would you be interested in trying a
test with this patch? So far, I don't have my own repro. Note that this patch
is only helpful as long as the DistCp command is not preserving metadata
attributes, so don't use the {{-p}} option.
Cc [[email protected]].
> In DistCp, prevent unnecessary getFileStatus call when not preserving
> metadata.
> -------------------------------------------------------------------------------
>
> Key: HADOOP-13145
> URL: https://issues.apache.org/jira/browse/HADOOP-13145
> Project: Hadoop Common
> Issue Type: Improvement
> Components: tools/distcp
> Reporter: Chris Nauroth
> Assignee: Chris Nauroth
> Attachments: HADOOP-13145.001.patch
>
>
> After DistCp copies a file, it calls {{getFileStatus}} to get the
> {{FileStatus}} from the destination so that it can compare to the source and
> update metadata if necessary. If the DistCp command was run without the
> option to preserve metadata attributes, then this additional
> {{getFileStatus}} call is wasteful.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]