sumitagrawl commented on code in PR #9351:
URL: https://github.com/apache/ozone/pull/9351#discussion_r2580155539


##########
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/dn/volume/TestDatanodeHddsVolumeFailureDetection.java:
##########
@@ -156,7 +156,8 @@ void corruptContainerFile(boolean schemaV3) throws 
Exception {
         // refer to HddsVolume.check()
         DatanodeTestUtils.simulateBadVolume(vol0);
 
-        // close container to trigger checkVolumeAsync
+        // close container to trigger checkVolumeAsync after 2 seconds as 
minGap to check
+        Thread.sleep(2000);

Review Comment:
   @ashishkumar50 @sodonnel 
   In Test MiniOzone cluster, diskCheckMinGap is configured as 2 second
   
`org.apache.hadoop.ozone.dn.volume.TestDatanodeHddsVolumeFailureDetection#newCluster
 --> dnConf.setDiskCheckMinGap(Duration.ofSeconds(2));`
   
   So, if Bad volume is simulated failure, and previous run and next run gap is 
< 2 second, it ignores the validation. This volume checker is dynamically 
triggered based on failure as in testcase and expected to report failure.
   
   In the testcase, its observed that volume check is triggered before marking 
the bad volume as default run, and after volume is simulated bad, next run with 
failure detection is less than 2 second.  This was the cause of random failure 
as even volumeChecker is triggered, but its ignored due to this restriction.
   
   This has started comming now as HB registration is quicker and DN is 
available for key create is faster.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to