Ivan Andika created HDDS-10717:
----------------------------------

             Summary: nodeFailureTimeoutMs should be initialized before 
syncTimeoutRetry
                 Key: HDDS-10717
                 URL: https://issues.apache.org/jira/browse/HDDS-10717
             Project: Apache Ozone
          Issue Type: Bug
          Components: DN, Ozone Datanode
    Affects Versions: 1.4.0
            Reporter: Ivan Andika
            Assignee: Ivan Andika


It is found that the Ratis WriteLog retry is "0/0" which means the WriteLog 
will not retry at all, and the datanode will trigger a pipeline failure to 
close the pipeline. This might explain why there are a lot of pipeline close 
events sent by the datanodes during high IO events.

The issue was due to nodeFailureTimeoutMs initialized after newRaftProperties 
and setStateMachineDataConfigurations which causes an issue.

Need to fix the ordering so that it's the syncTimeoutRetry is calculated 
correctly (default 30 times).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to