[ 
https://issues.apache.org/jira/browse/HADOOP-16047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16742269#comment-16742269
 ] 

Steve Loughran commented on HADOOP-16047:
-----------------------------------------

I'm going to close this as a duplicate, not because it's not a problem, but 
because other people (me) have been complaining about it in the past: 
HADOOP-15281

If you can do the work there, I promise I'll do my best to get it in.

A new test case  will inevitably be needed in AbstractContractDistCpTest, so 
all the stores run it. The S3A specific subclass can maybe use its metrics to 
count the #of renames, so check things are working

> Avoid expensive rename when DistCp is writing to S3
> ---------------------------------------------------
>
>                 Key: HADOOP-16047
>                 URL: https://issues.apache.org/jira/browse/HADOOP-16047
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs/s3, tools/distcp
>            Reporter: Andrew Olson
>            Priority: Major
>
> When writing to an S3-based target, the temp file and rename logic in 
> RetriableFileCopyCommand adds some unnecessary cost to the job, as the rename 
> operation does a server-side copy + delete in S3 [1]. The renames are 
> parallelized across all of the DistCp map tasks, so the severity is mitigated 
> to some extent. However a configuration property to conditionally allow 
> distributed copies to avoid that expense and write directly to the target 
> path would improve performance considerably.
> [1] 
> https://github.com/apache/hadoop/blob/release-3.2.0-RC1/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/introduction.md#object-stores-vs-filesystems



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to