[
https://issues.apache.org/jira/browse/HADOOP-16775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Amir Shenavandeh updated HADOOP-16775:
--------------------------------------
Attachment: patch.txt
> Hadoop DistCp reuses the same temp file within the task for different files.
> ----------------------------------------------------------------------------
>
> Key: HADOOP-16775
> URL: https://issues.apache.org/jira/browse/HADOOP-16775
> Project: Hadoop Common
> Issue Type: Improvement
> Components: tools/distcp
> Affects Versions: 2.0
> Reporter: Amir Shenavandeh
> Priority: Major
> Attachments: patch.txt
>
>
> Hadoop DistCp reuses the same temp file name for all the files copied within
> each task attempt and then moves them to the target name, which also a server
> side copy. For copies over S3 this will cause inconsistency as S3 is only
> consistent for read after writes, for brand new objects. There is also
> inconsistency for contents of overwritten objects on S3.
> To avoid this, we should randomize the temp file name.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]