[ 
https://issues.apache.org/jira/browse/HADOOP-15224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17928109#comment-17928109
 ] 

ASF GitHub Bot commented on HADOOP-15224:
-----------------------------------------

steveloughran commented on PR #7396:
URL: https://github.com/apache/hadoop/pull/7396#issuecomment-2665905814

   Thanks, these are really interesting results.
   
   ## performance difference
   
   Yes, there is some with large files, but with all the tests ongoing it's 
hard to isolate.
   
   Can you 
   * do a full release build, move it out the source tree and set it up with 
credentials somehow (or run in a VM with some)
   * download the latest cloudstore jar 
(https://github.com/steveloughran/cloudstore) 
   and use it's bandwidth command? 
https://github.com/steveloughran/cloudstore/blob/main/src/main/site/bandwidth.md
   
   something like
   
   ```
   bin/hadoop jar $CLOUDSTORE bandwidth -block 256M  -csv results.csv 10G 
$BUCKET/testfile
   ```
   
   And repeat for checksums on/off (there's a -xmlfile option to take a path to 
an xml file of extra settings),
   then share those CSVs?
   
   Incidentally, presumably the checksum is calculated during the upload. We 
queue blocks for upload, and if the checksum could be calculated at the time of 
queueing takes place, maybe it could be more efficient, as the thread doing the 
upload would not be held up, just the worker thread of the application.
   
   ## delete throttling
   ```
   [ERROR] Errors:
   [ERROR] ILoadTestS3ABulkDeleteThrottling.test_020_DeleteThrottling
   [INFO]   Run 1: PASS
   [ERROR]   Run 2: software.amazon.awssdk.services.s3.model.S3Exception: 
Please reduce your request rate. (Service: S3, Status Code: 200, Request ID: 
DP0XV0P28HHBMMB0, Extended Request ID: 
QS0YF7VpXRMDpgIsuVCZCavni+uFNTnsCA0pylJxoqXx9DdsGQot698AaQncMHPIO4qs0Fgce8AVRHL6i4V6Hg==)
   [INFO]   Run 3: PASS
   [INFO]   Run 4: PASS
   [INFO]
   ```
   
   This is a very interesting as it shows we aren't detecting, mapping and 
handling throttle responses from bulk requests. In the V1 SDK their response 
was always a 503, now we get a 200 and a text message, which is a pain as 
string matching for errors is brittle.
   
   * Would you be able to get a full stack trace of that? It should be in the 
output files of ILoadTestS3ABulkDeleteThrottling; a `mvn 
surefire-report:report-only` would generate the full html reports, but I'm 
happy with the raw files.
   * Do you know if this error text is frozen?
   
   Our bulk delete API does rate limit, but we don't do that for directory 
delete (yet) as we never did. Maybe we should revisit that.
   
   
   




> builld up md5 checksum as blocks are built in S3ABlockOutputStream; validate 
> upload
> -----------------------------------------------------------------------------------
>
>                 Key: HADOOP-15224
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15224
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.0.0
>            Reporter: Steve Loughran
>            Assignee: Raphael Azzolini
>            Priority: Minor
>              Labels: pull-request-available
>
> [~rdblue] reports sometimes he sees corrupt data on S3. Given MD5 checks from 
> upload to S3, its likelier to have happened in VM RAM, HDD or nearby.
> If the MD5 checksum for each block was built up as data was written to it, 
> and checked against the etag RAM/HDD storage of the saved blocks could be 
> removed as sources of corruption
> The obvious place would be 
> {{org.apache.hadoop.fs.s3a.S3ADataBlocks.DataBlock}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to