steveloughran commented on issue #1826: HADOOP-16823. Large DeleteObject 
requests are their own Thundering Herd
URL: https://github.com/apache/hadoop/pull/1826#issuecomment-584690703
 
 
   Did checkstyle changes and a diff with trunk to (a) reduce the diff and (b) 
see what I needed to improve with javadocs; mainly the RetryingCollection.
   
   I got a failure on a -Dscale auth run
   
   ```
   [ERROR]   
ITestS3AContractRootDir>AbstractContractRootDirectoryTest.testRecursiveRootListing:257->Assert.assertTrue:41->Assert.fail:88
 files mismatch: between 
     
"s3a://hwdev-steve-ireland-new/fork-0003/test/testBulkRenameAndDelete/src/file-1"
     
"s3a://hwdev-steve-ireland-new/fork-0003/test/testBulkRenameAndDelete/src/file-25"
     
"s3a://hwdev-steve-ireland-new/fork-0003/test/testBulkRenameAndDelete/src/file-16"
     
"s3a://hwdev-steve-ireland-new/fork-0003/test/testBulkRenameAndDelete/src/file-11"
     
"s3a://hwdev-steve-ireland-new/fork-0003/test/testBulkRenameAndDelete/src/file-7"
     
"s3a://hwdev-steve-ireland-new/fork-0003/test/testBulkRenameAndDelete/src/file-54"
     
"s3a://hwdev-steve-ireland-new/fork-0003/test/testBulkRenameAndDelete/src/file-14"
     
"s3a://hwdev-steve-ireland-new/fork-0003/test/testBulkRenameAndDelete/src/file-35"
     
"s3a://hwdev-steve-ireland-new/fork-0003/test/testBulkRenameAndDelete/src/file-48"
     
"s3a://hwdev-steve-ireland-new/fork-0003/test/testBulkRenameAndDelete/src/file-56"
     
"s3a://hwdev-steve-ireland-new/fork-0003/test/testBulkRenameAndDelete/src/file-29"
     
"s3a://hwdev-steve-ireland-new/fork-0003/test/testBulkRenameAndDelete/src/file-52"
     
"s3a://hwdev-steve-ireland-new/fork-0003/test/testBulkRenameAndDelete/src/file-40"
     
"s3a://hwdev-steve-ireland-new/fork-0003/test/testBulkRenameAndDelete/src/file-2"
     
"s3a://hwdev-steve-ireland-new/fork-0003/test/testBulkRenameAndDelete/src/file-24"
     "s3a:
   ```
   
   Now, I've been playing with older branch-2 versions recently, and could 
blame that -but "bulk" and "delete" describe exactly what I was working on in 
this patch.
   
   It wasn't, but while working on this tests, with better renames, I managed 
to create a deadlock in the new code
   
   1. S3ABlockOutputStream was waiting for space in the bounded thread pool so 
it can do an async put.
   1. But that thread pool was blocked by threads waiting for their async 
directory operations to complete.
   1. Outcome: total deadlock.
   
   Surfaced in ITestS3ADeleteManyFiles during parallel file creation.
   
   Actions
   * remove the async stuff from the end of rename()
   * keep dir marker delete operations in finishedWrite() async, but use the 
unbounded thread pool.
   * Cleanup + enhancement of ITestS3ADeleteManyFiles so that it tests src and 
dest paths more rigorously, and sets a page size of 50 for better coverage of 
the paged rename sequence.
   
   Makes me think we should do more parallel IO tests within the same process.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to