sodonnel commented on code in PR #9351:
URL: https://github.com/apache/ozone/pull/9351#discussion_r2564665208


##########
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/dn/volume/TestDatanodeHddsVolumeFailureDetection.java:
##########
@@ -156,7 +156,8 @@ void corruptContainerFile(boolean schemaV3) throws 
Exception {
         // refer to HddsVolume.check()
         DatanodeTestUtils.simulateBadVolume(vol0);
 
-        // close container to trigger checkVolumeAsync
+        // close container to trigger checkVolumeAsync after 2 seconds as 
minGap to check
+        Thread.sleep(2000);

Review Comment:
   In the test code:
   
   ```
         try {
           DatanodeTestUtils.injectContainerMetaDirFailure(metadataDir);
   
           // simulate bad volume by removing write permission on root dir
           // refer to HddsVolume.check()
           DatanodeTestUtils.simulateBadVolume(vol0);
   
           // close container to trigger checkVolumeAsync
           assertThrows(IOException.class, c1::close);
   ```
   SimulateBadVolume is DN side code and c1::close is also DN code. There 
should be no need to sleep inbetween both of these calls. I don't see how the 
change in this PR has triggered this problem, and in general we need to come up 
with a better way of dealing with delays that putting a sleep in. There will be 
a time when the 2 seconds is not enough and it causes flaky failures. I think 
this failiure, if caused by this PR needs investigate a bit more to see what is 
going on.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to