[ 
https://issues.apache.org/jira/browse/HADOOP-16900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17051067#comment-17051067
 ] 

Steve Loughran commented on HADOOP-16900:
-----------------------------------------

I was thinking about this and came up with a very cunning solution

As the number of blocks written increases, we expand the size of each block

e.g: up to 4000 blocks, you get the fs.s3a block size value. 

then multiples by 4x every thousand, block size * 4^5 by the last 1000 of the 
10K you are at 32GB per block.


What do people think? It would have some consequences -uploads use more memory- 
but it would get the data up even for very large source files?


I see we should also look at the part size on copy requests; we should make 
sure that what we tell the xfer manager to split a file up to also meets the 
10K limit.


> Very large files can be truncated when written through S3AFileSystem
> --------------------------------------------------------------------
>
>                 Key: HADOOP-16900
>                 URL: https://issues.apache.org/jira/browse/HADOOP-16900
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs/s3
>    Affects Versions: 3.2.1
>            Reporter: Andrew Olson
>            Assignee: Steve Loughran
>            Priority: Major
>              Labels: s3
>
> If a written file size exceeds 10,000 * {{fs.s3a.multipart.size}}, a corrupt 
> truncation of the S3 object will occur as the maximum number of parts in a 
> multipart upload is 10,000 as specific by the S3 API and there is an apparent 
> bug where this failure is not fatal, and the multipart upload is allowed to 
> be marked as completed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to