[ 
https://issues.apache.org/jira/browse/HADOOP-16900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17051124#comment-17051124
 ] 

Mukund Thakur commented on HADOOP-16900:
----------------------------------------

{quote}I probably should open a separate issue to have distcp compare the file 
lengths before deciding that a path already present in the target location can 
be safely skipped.
{quote}
I think DistCp has already length check in place. FYI,

https://github.com/apache/hadoop/blob/bbd704bb828577a1f0afe5fb0ac358fb7c5af446/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/CopyMapper.java#L350

> Very large files can be truncated when written through S3AFileSystem
> --------------------------------------------------------------------
>
>                 Key: HADOOP-16900
>                 URL: https://issues.apache.org/jira/browse/HADOOP-16900
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs/s3
>    Affects Versions: 3.2.1
>            Reporter: Andrew Olson
>            Assignee: Steve Loughran
>            Priority: Major
>              Labels: s3
>
> If a written file size exceeds 10,000 * {{fs.s3a.multipart.size}}, a corrupt 
> truncation of the S3 object will occur as the maximum number of parts in a 
> multipart upload is 10,000 as specific by the S3 API and there is an apparent 
> bug where this failure is not fatal, and the multipart upload is allowed to 
> be marked as completed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to