[
https://issues.apache.org/jira/browse/HADOOP-16900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17051330#comment-17051330
]
Andrew Olson commented on HADOOP-16900:
---------------------------------------
[[email protected]] An adaptive solution like that sounds good to me. Yes
depending on the fs.s3a.fast.upload.buffer and related configurations some
moderate amount of additional resources could be required but it should be
successful more often that not.
Since DistCp does have that source vs target length check, I'm not sure how our
DistCp job still managed to succeed when this happened. When I have some time
I'll investigate that further to try to clear up the mystery. For what it's
worth we were using the -strategy dynamic option.
> Very large files can be truncated when written through S3AFileSystem
> --------------------------------------------------------------------
>
> Key: HADOOP-16900
> URL: https://issues.apache.org/jira/browse/HADOOP-16900
> Project: Hadoop Common
> Issue Type: Bug
> Components: fs/s3
> Affects Versions: 3.2.1
> Reporter: Andrew Olson
> Assignee: Steve Loughran
> Priority: Major
> Labels: s3
>
> If a written file size exceeds 10,000 * {{fs.s3a.multipart.size}}, a corrupt
> truncation of the S3 object will occur as the maximum number of parts in a
> multipart upload is 10,000 as specific by the S3 API and there is an apparent
> bug where this failure is not fatal, and the multipart upload is allowed to
> be marked as completed.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]