[ 
https://issues.apache.org/jira/browse/HADOOP-16900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17051070#comment-17051070
 ] 

Steve Loughran commented on HADOOP-16900:
-----------------------------------------

bq. I probably should open a separate issue to have distcp compare the file 
lengths before deciding that a path already present in the target location can 
be safely skipped.

thought it did. 

I added the etag -> checksum feature so that a future distcp could track 
checksums at the dest and look for changes there too, but that broke distcp 
which assumes that source checksum algorithm == dest, and therefore the values 
match at both ends, even across filesystem schemas (it holds for hdfs -> 
webhdfs, roughly)

> Very large files can be truncated when written through S3AFileSystem
> --------------------------------------------------------------------
>
>                 Key: HADOOP-16900
>                 URL: https://issues.apache.org/jira/browse/HADOOP-16900
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs/s3
>    Affects Versions: 3.2.1
>            Reporter: Andrew Olson
>            Assignee: Steve Loughran
>            Priority: Major
>              Labels: s3
>
> If a written file size exceeds 10,000 * {{fs.s3a.multipart.size}}, a corrupt 
> truncation of the S3 object will occur as the maximum number of parts in a 
> multipart upload is 10,000 as specific by the S3 API and there is an apparent 
> bug where this failure is not fatal, and the multipart upload is allowed to 
> be marked as completed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to