[
https://issues.apache.org/jira/browse/KUDU-556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Grant Henke updated KUDU-556:
-----------------------------
Labels: scalability (was: )
> Client: When quorum leader times out, client should try other members of the
> quorum before returning to the Master
> ------------------------------------------------------------------------------------------------------------------
>
> Key: KUDU-556
> URL: https://issues.apache.org/jira/browse/KUDU-556
> Project: Kudu
> Issue Type: Bug
> Components: client
> Affects Versions: M4.5
> Reporter: Mike Percy
> Priority: Major
> Labels: scalability
>
> The client currently goes straight to the Master when the leader of a quorum
> dies. This has some deleterious effects:
> 1. Puts significant additional load on the Master during tablet failover.
> 2. Adds additional recovery delay due to waiting for leader to not only be
> elected but also heartbeat to the Master with a full tablet report and for
> that tablet report to become visible on the Master.
> Instead, we should use the following algorithm (TODO: this only works for
> writes; work out details for consistent leader-only reads):
> 1. If the leader cannot be contacted, try other quorum members in a
> round-robin fashion.
> 2. If a write is attempted on a non-leader replica, an error including a
> redirect to the current leader should be returned by the server. If the
> leader is unknown, that should also be indicated.
> 3. If the client able to find the leader in this fashion, continue. If all
> followers are exhausted without finding the leader, return to the Master to
> look up the latest quorum information and repeat until timeout.
> TODO: Since there are valid cases when a client may read from a follower or
> learner, read operations may need to become aware of whether they are desired
> to be consistent leader reads or whether they are allowed to be follower
> reads.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)