[
https://issues.apache.org/jira/browse/HDDS-9279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17833795#comment-17833795
]
Ivan Andika commented on HDDS-9279:
-----------------------------------
I did some brainstorming and research about follower read. Hopefully this will
highlight some ideas around follower read.
Here are the possible considerations:
* Follower replica selection strategy
** Might need to be aware of follower replica selection strategy
** Round-robin (TiDB)
* Follower replica selection granularity
** Connection level
** Request level
* Follower affinity: If failover to leader, it should try to find follower if
possible
** So that the case does not degenerate to just reading from leader
** Between followers, we should pick the one with the highest known commitIndex
*** Or we can just randomize it
** Once a proxy pick a follower, it should stick to it until there is an
exception
*** See `ObserverReadProxyProvider#shouldFindObserver`
* Might need to implement similar to
`ObserverReadProxyProvider#getHAServiceStateWithTimeout`
** This can also act as some kind of ping for unreachable peer
** We can also randomize the node ID
*** We send the first read request to the first node
*** If it’s a leader, the next read call we will change to the next node
(hopefully it’s a follower)
* Ozone seem to have `OzoneManagerProtocol#getServiceList` to the leader
** Can be used to get the information about the OM HA service
* Pick leader / follower based on the client workload
** How to pick whether a particular operation is a write or read only
operations?
*** OmUtils#isReadOnly(OMRequest)?
*** Or annotation (ReadOnly annotation in HDFS)?
* Possibility of using two proxies, one for write and one for read
** Write → `OMFailoverProxyProvider`
** Read → `ReadOnlyOMFailOverProxyProvider`
*** We can have a client configuration that will use the
`OMFailoverProxyProvider` instead
* Possibility of using `InvocationHandler`
** Similar to `ObserverReadInvocationHandler`
** Ozone also have `InvocationHandler` in the form of `TraceAllMethod`
*** Initialized in `TracingUtil#createProxy`
*** Called in `RpcClient#createOzoneManagerClient`
* For write followed by read
** There is a possibility of saving the `ReadIndex` call from follower to
reader by adding an index when client writes to the leader and passing it to
the follower
*** Similar to observer namenodes
* This might reduce the consistency guarantee from linearizable to
read-your-own-write
** Might not be applied to S3G since it’s stateless and each request might be
routed to any S3G
** See how HDFS Router is implemented
* Leader failover handling (might not need to implement now)
** Leader will throw `NotLeaderException`, should we just move to the next
proxy?
** Might pick the follower with largest commitIndex
* Slow follower handling
** Should we keep track of each OM commitIndex?
** From the follower we can pick the highest commitIndex as the preferred
commitIndex
** Best effort: No need for client to contact all 3 OMs at once
*** Broadcast might be related to `RequestHedgingProxyProvider`
*** [https://grpc.io/docs/guides/request-hedging/]
* Configuration to check the follower is way behind
** Failover to the next follower / leader
** Ratis follower can also set a threshold to keep track to the lag to the
leader
*** After the threshold is hit, the follower will stop serving read
* Think about Third-party Communications (3PC)
** If one client modifies the namespace and passes that knowledge to other
clients, the latter ones should be able to read from Observer the same or a
later state of the namespace, but not an earlier state.
** I think this is not needed since the consistency is linearizable
* Compatibility
** If client tries to read from OM follower that does not enable the
linearizable read, it should throw `NotLeaderException`
* Topology awareness
** We can instead query from the nearest follower instead
> OM HA: support read from followers
> ----------------------------------
>
> Key: HDDS-9279
> URL: https://issues.apache.org/jira/browse/HDDS-9279
> Project: Apache Ozone
> Issue Type: Improvement
> Components: OM HA
> Reporter: Tsz-wo Sze
> Assignee: Tsz-wo Sze
> Priority: Major
> Labels: pull-request-available
>
> Ratis has a new Linearizable Read (RATIS-1557) feature, including reading
> from the followers. In this JIRA, we will change OM to serve read requests
> from any OM servers, including the follower OMs.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]