[
https://issues.apache.org/jira/browse/RATIS-1874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17758177#comment-17758177
]
Tsz-wo Sze commented on RATIS-1874:
-----------------------------------
[~tanxinyu], In Apache Ozone, we detect leader ready by the following steps.
Please take a look if it is good for your use case before adding the new
notifyLeaderReady API.
1. Record the leader term in `notifyLeaderChanged`.
https://github.com/apache/ozone/blob/fcf5b17a4ede0c55c76aa337438498361c0a5dd3/hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/ha/SCMStateMachine.java#L282-L283
2. Compare the term in `notifyTermIndexUpdated` and check
`DivisionInfo.isLeaderReady()`.
https://github.com/apache/ozone/blob/fcf5b17a4ede0c55c76aa337438498361c0a5dd3/hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/ha/SCMStateMachine.java#L357-L361
> Add notifyLeaderReady function in IStateMachine
> -----------------------------------------------
>
> Key: RATIS-1874
> URL: https://issues.apache.org/jira/browse/RATIS-1874
> Project: Ratis
> Issue Type: Improvement
> Components: StateMachine
> Reporter: Xinyu Tan
> Assignee: Xinyu Tan
> Priority: Major
> Time Spent: 40m
> Remaining Estimate: 0h
>
> In a highly available metadata service built on top of Ratis, such as IoTDB's
> confignode, the Leader node launches more modules compared to the Follower
> node. These additional modules include functionalities like load balancing
> and others.
> Currently, we employ the "notifyLeaderChange" callback to prompt the Leader
> to initiate the corresponding modules like load balancing. However, within
> this callback, the leader's state machine might not have fully recovered,
> potentially leading to the retrieval of outdated data when directly reading
> from certain modules.
> One approach in this scenario would involve utilizing a linearizable read
> interface along with implementing corresponding retry logic (such as timeouts
> or waiting until a leader is elected). However, such modifications would
> result in significant changes to our codebase. Therefore, we are inclined to
> opt for an alternative solution – adding a "notifyLeaderReady" interface to
> the "StateMachine". This interface would be invoked only when the Leader's
> state machine applies the first log entry of its current term. This
> adjustment would ensure the accurate recovery of certain modules.
>
> [~szetszwo] What's your opinion?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)