ptlrs commented on code in PR #9947:
URL: https://github.com/apache/ozone/pull/9947#discussion_r2984272964
##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/statemachine/DatanodeConfiguration.java:
##########
@@ -404,6 +407,17 @@ public class DatanodeConfiguration extends
ReconfigurableConfig {
)
private Duration diskCheckTimeout = DISK_CHECK_TIMEOUT_DEFAULT;
+ @Config(key = DISK_CHECK_RETRY_GAP_KEY,
+ defaultValue = "1m",
Review Comment:
So 2 checks should be more than sufficient to definitively declare if we are
failing to open RocksDb. After two checks we should let the sliding window
handle if further checks should be required on that volume.
If we allow more than 2 checks then definitely the timeouts for each disk
check will also have to become dynamic and the 10 minute threshold maybe too
small. On the other hand we would like to know within 10 minutes if a disk is
unhealthy or not as elongating this check pushes the future checks further out.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]