[
https://issues.apache.org/jira/browse/HADOOP-17256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Steve Loughran resolved HADOOP-17256.
-------------------------------------
Resolution: Duplicate
caused by HADOOP-8143, which has now been rolled back everywhere it went in. It
can also cause 404 errors, so was a critical roll back. Closing as a duplicate
of HADOOP-8143
All future releases of Hadoop branch 3 will contain this fix
> DistCp -update option will be invalid when distcp files from hdfs to S3
> -----------------------------------------------------------------------
>
> Key: HADOOP-17256
> URL: https://issues.apache.org/jira/browse/HADOOP-17256
> Project: Hadoop Common
> Issue Type: Bug
> Components: tools/distcp
> Reporter: liuxiaolong
> Priority: Major
> Attachments: image-2020-09-10-17-25-46-354.png,
> image-2020-09-10-17-33-50-505.png, image-2020-09-10-17-45-16-998.png,
> image-2020-09-10-17-47-01-653.png, image-2020-09-10-17-52-32-290.png
>
>
> We use distcp with -update option to copy a dir from hdfs to S3. When we run
> distcp job once more, it will overwrite S3 dir directly, rather than skip the
> same files.
>
> Test Case:
> Run twice the following cmd, the modify time of S3 files will be modified
> every time.
> hadoop distcp -update /test/ s3a://${s3_buckect}/test/
>
> Check code in CopyMapper.java and S3AFileSystem.java
> (1) For the first time, distcp job will create files in S3, but blockSize is
> unused!
> !image-2020-09-10-17-45-16-998.png|width=542,height=485!
>
> (2) For the second time, the distcp job will compare fileSize and blockSize
> between hdfs file and S3 file
> !image-2020-09-10-17-47-01-653.png|width=524,height=248!
>
> (3) blockSize is unused, when get blockSize of S3 file, it return a default
> value.
> In S3AFileSystem.java, we find that the default value of fs.s3a.block.size is
> 32 * 1024 * 1024.
> !image-2020-09-10-17-33-50-505.png|width=451,height=762!
>
> !image-2020-09-10-17-52-32-290.png|width=527,height=87!
>
> The blockSize of HDFS seems invalid in Object Store, like S3. So I think
> there's no need to compare blockSize when distcp with -update option.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]