ptlrs commented on code in PR #9947:
URL: https://github.com/apache/ozone/pull/9947#discussion_r2984272964


##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/statemachine/DatanodeConfiguration.java:
##########
@@ -404,6 +407,17 @@ public class DatanodeConfiguration extends 
ReconfigurableConfig {
   )
   private Duration diskCheckTimeout = DISK_CHECK_TIMEOUT_DEFAULT;
 
+  @Config(key = DISK_CHECK_RETRY_GAP_KEY,
+      defaultValue = "1m",

Review Comment:
   So 2 checks should be more than sufficient to definitively declare if we are 
failing to open RocksDb. After two checks we should let the sliding window 
handle if further checks should be required on that volume. 
   
   If we allow more than 2 checks then definitely the timeouts for each disk 
check will also have to become dynamic and the 10 minute threshold maybe too 
small. On the other hand we would like to know within 10 minutes if a disk is 
unhealthy or not as elongating this check pushes the future checks further out. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to