[
https://issues.apache.org/jira/browse/HADOOP-16900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17051070#comment-17051070
]
Steve Loughran commented on HADOOP-16900:
-----------------------------------------
bq. I probably should open a separate issue to have distcp compare the file
lengths before deciding that a path already present in the target location can
be safely skipped.
thought it did.
I added the etag -> checksum feature so that a future distcp could track
checksums at the dest and look for changes there too, but that broke distcp
which assumes that source checksum algorithm == dest, and therefore the values
match at both ends, even across filesystem schemas (it holds for hdfs ->
webhdfs, roughly)
> Very large files can be truncated when written through S3AFileSystem
> --------------------------------------------------------------------
>
> Key: HADOOP-16900
> URL: https://issues.apache.org/jira/browse/HADOOP-16900
> Project: Hadoop Common
> Issue Type: Bug
> Components: fs/s3
> Affects Versions: 3.2.1
> Reporter: Andrew Olson
> Assignee: Steve Loughran
> Priority: Major
> Labels: s3
>
> If a written file size exceeds 10,000 * {{fs.s3a.multipart.size}}, a corrupt
> truncation of the S3 object will occur as the maximum number of parts in a
> multipart upload is 10,000 as specific by the S3 API and there is an apparent
> bug where this failure is not fatal, and the multipart upload is allowed to
> be marked as completed.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]