[
https://issues.apache.org/jira/browse/HADOOP-16775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17243980#comment-17243980
]
Steve Loughran commented on HADOOP-16775:
-----------------------------------------
Note: consistent S3 renders this fix moot. Older releases are safe to use.
> DistCp reuses the same temp file within the task attempt for different files.
> -----------------------------------------------------------------------------
>
> Key: HADOOP-16775
> URL: https://issues.apache.org/jira/browse/HADOOP-16775
> Project: Hadoop Common
> Issue Type: Improvement
> Components: tools/distcp
> Affects Versions: 3.0.0
> Reporter: Amir Shenavandeh
> Assignee: Amir Shenavandeh
> Priority: Major
> Labels: DistCp, S3, hadoop-tools
> Fix For: 3.2.2
>
> Attachments: HADOOP-16775-v1.patch, HADOOP-16775.patch
>
>
> Hadoop DistCp reuses the same temp file name for all the files copied within
> each task attempt and then moves them to the target name, which is also a
> server side copy. For copies to S3, this will cause inconsistency as S3 is
> only consistent for reads after writes, for brand new objects. There is also
> inconsistency for contents of overwritten objects on S3.
> To avoid this, we should randomize the temp file name and for each temp file
> use a different name.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]