[
https://issues.apache.org/jira/browse/RATIS-800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17155319#comment-17155319
]
runzhiwang commented on RATIS-800:
----------------------------------
[~ljain] Thanks for review.
bq. Balancing the leader in an active ratis ring might be difficult to achieve.
For a candidate to be elected as leader its term and index should be >=
follower's term index. Even if we trigger an election it is not guaranteed that
the datanode will become leader.
We can first focus on balance leader. This has been explained in raft paper as
following. If leadership transfer does not complete after about an election
timeout, the prior leader aborts the transfer and still act as the leader, and
resumes accepting client requests.
bq. To transfer leadership in Raft, the prior leader sends its log entries to
the target server, then the
bq. target server runs an election without waiting for an election timeout to
elapse. The prior leader
bq. thus ensures that the target server has all committed entries at the start
of its term, and, as in normal
bq. elections, the majority voting guarantees the safety properties (such as
the Leader Completeness
bq. Property) are maintained. The following steps describe the process in more
detail:
bq. 1. The prior leader stops accepting new client requests.
bq. 2. The prior leader fully updates the target server’s log to match its own,
using the normal log
bq. replication mechanism described in Section 3.5.
bq. 3. The prior leader sends a TimeoutNow request to the target server. This
request has the same
bq. effect as the target server’s election timer firing: the target server
starts a new election (incrementing
bq. its term and becoming a candidate).
bq. Once the target server receives the TimeoutNow request, it is highly likely
to start an election before
bq. any other server and become leader in the next term. Its next message to
the prior leader will include
bq. its new term number, causing the prior leader to step down. At this point,
leadership transfer is
bq. complete.
bq. It is also possible for the target server to fail; in this case, the
cluster must resume client operations.
bq. If leadership transfer does not complete after about an election timeout,
the prior leader aborts
bq. the transfer and resumes accepting client requests. If the prior leader was
mistaken and the target
bq. server is actually operational, then at worst this mistake will result in
an extra election, after which
bq. client operations will be restored.
> Make Ratis consume recommended leader host from the pipeline creator
> --------------------------------------------------------------------
>
> Key: RATIS-800
> URL: https://issues.apache.org/jira/browse/RATIS-800
> Project: Ratis
> Issue Type: Sub-task
> Reporter: Li Cheng
> Assignee: runzhiwang
> Priority: Critical
>
> Start a Jira for suggested leader sematics. It would help Ratis performance
> if it can consume the leader host which its upstream user like Ozone
> recommends. User can choose the leader host based on load balance and rack
> awareness.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)