[
https://issues.apache.org/jira/browse/HADOOP-16900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17051124#comment-17051124
]
Mukund Thakur commented on HADOOP-16900:
----------------------------------------
{quote}I probably should open a separate issue to have distcp compare the file
lengths before deciding that a path already present in the target location can
be safely skipped.
{quote}
I think DistCp has already length check in place. FYI,
https://github.com/apache/hadoop/blob/bbd704bb828577a1f0afe5fb0ac358fb7c5af446/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/CopyMapper.java#L350
> Very large files can be truncated when written through S3AFileSystem
> --------------------------------------------------------------------
>
> Key: HADOOP-16900
> URL: https://issues.apache.org/jira/browse/HADOOP-16900
> Project: Hadoop Common
> Issue Type: Bug
> Components: fs/s3
> Affects Versions: 3.2.1
> Reporter: Andrew Olson
> Assignee: Steve Loughran
> Priority: Major
> Labels: s3
>
> If a written file size exceeds 10,000 * {{fs.s3a.multipart.size}}, a corrupt
> truncation of the S3 object will occur as the maximum number of parts in a
> multipart upload is 10,000 as specific by the S3 API and there is an apparent
> bug where this failure is not fatal, and the multipart upload is allowed to
> be marked as completed.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]