Ivan Andika created HDDS-13819:
----------------------------------

             Summary: Add a short wait before starting service after Ratis 
leadership change
                 Key: HDDS-13819
                 URL: https://issues.apache.org/jira/browse/HDDS-13819
             Project: Apache Ozone
          Issue Type: Improvement
            Reporter: Ivan Andika
            Assignee: Ivan Andika


Currently, isLeaderReady is used to check whether a internal service should be 
started to prevent multiple service to running at the same time. For cases 
where the Ratis group is working normally (no network partitions, etc), this 
check should be fine since there should be one leader.

However, there might a case where there is a small window (within 
raft.server.rpc.timeout.max which defaults to 300ms) where there are two OM or 
SCM nodes that believe it is the leader before one steps down with 
LOST_MAJORITY_HEARTBEATS. During this period there might be two services 
running at the same time which can update the OM / SCM state.

One way is to add a short sleep before starting the service and checking the 
leadership again before starting one service run. Additionally, we can should 
also interrupt the background service if there is a leadership change.

This is only one instance, this can be expanded to a story to review and 
consolidate the consistency guarantee of OM and SCM background services.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to