[ 
https://issues.apache.org/jira/browse/HADOOP-15224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17927616#comment-17927616
 ] 

Raphael Azzolini commented on HADOOP-15224:
-------------------------------------------

[[email protected]] I published the PR above for adding the checksum option.

I executed the scale tests with different {{fs.s3a.scale.test.huge.filesize}} 
values (default, 1G, and 10G). When using checksum, I could see a difference on 
time with 10G, but 1G and the default size had similar time to complete as when 
no checksum algorithm was set. I attached a file to the PR with the test 
results.

Let me know if you want me to test any other combination.

Regarding other tests, the only integration file that failed with the checksum 
was {{ITestCustomSigner}}, it looks like the signer doesn't set the checksum 
header. I couldn't find a way to tell the signer to create the checksum, so I 
unset checksum for this test.

> builld up md5 checksum as blocks are built in S3ABlockOutputStream; validate 
> upload
> -----------------------------------------------------------------------------------
>
>                 Key: HADOOP-15224
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15224
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.0.0
>            Reporter: Steve Loughran
>            Assignee: Raphael Azzolini
>            Priority: Minor
>              Labels: pull-request-available
>
> [~rdblue] reports sometimes he sees corrupt data on S3. Given MD5 checks from 
> upload to S3, its likelier to have happened in VM RAM, HDD or nearby.
> If the MD5 checksum for each block was built up as data was written to it, 
> and checked against the etag RAM/HDD storage of the saved blocks could be 
> removed as sources of corruption
> The obvious place would be 
> {{org.apache.hadoop.fs.s3a.S3ADataBlocks.DataBlock}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to