[ 
https://issues.apache.org/jira/browse/HDDS-9279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17833795#comment-17833795
 ] 

Ivan Andika commented on HDDS-9279:
-----------------------------------

I did some brainstorming and research about follower read. Hopefully this will 
highlight some ideas around follower read.

Here are the possible considerations:
 * Follower replica selection strategy
 ** Might need to be aware of follower replica selection strategy
 ** Round-robin (TiDB)
 * Follower replica selection granularity
 ** Connection level
 ** Request level
 * Follower affinity: If failover to leader, it should try to find follower if 
possible
 ** So that the case does not degenerate to just reading from leader
 ** Between followers, we should pick the one with the highest known commitIndex
 *** Or we can just randomize it
 ** Once a proxy pick a follower, it should stick to it until there is an 
exception
 *** See `ObserverReadProxyProvider#shouldFindObserver`
 * Might need to implement similar to 
`ObserverReadProxyProvider#getHAServiceStateWithTimeout`
 ** This can also act as some kind of ping for unreachable peer
 ** We can also randomize the node ID
 *** We send the first read request to the first node
 *** If it’s a leader, the next read call we will change to the next node 
(hopefully it’s a follower)
 * Ozone seem to have `OzoneManagerProtocol#getServiceList` to the leader
 ** Can be used to get the information about the OM HA service
 * Pick leader / follower based on the client workload
 ** How to pick whether a particular operation is a write or read only 
operations?
 *** OmUtils#isReadOnly(OMRequest)?
 *** Or annotation (ReadOnly annotation in HDFS)?
 * Possibility of using two proxies, one for write and one for read
 ** Write → `OMFailoverProxyProvider`
 ** Read → `ReadOnlyOMFailOverProxyProvider`
 *** We can have a client configuration that will use the 
`OMFailoverProxyProvider` instead
 * Possibility of using `InvocationHandler`
 ** Similar to `ObserverReadInvocationHandler`
 ** Ozone also have `InvocationHandler` in the form of `TraceAllMethod`
 *** Initialized in `TracingUtil#createProxy`
 *** Called in `RpcClient#createOzoneManagerClient`
 * For write followed by read
 ** There is a possibility of saving the `ReadIndex` call from follower to 
reader by adding an index when client writes to the leader and passing it to 
the follower
 *** Similar to observer namenodes
 * This might reduce the consistency guarantee from linearizable to 
read-your-own-write
 ** Might not be applied to S3G since it’s stateless and each request might be 
routed to any S3G
 ** See how HDFS Router is implemented
 * Leader failover handling (might not need to implement now)
 ** Leader will throw `NotLeaderException`, should we just move to the next 
proxy?
 ** Might pick the follower with largest commitIndex
 * Slow follower handling
 ** Should we keep track of each OM commitIndex?
 ** From the follower we can pick the highest commitIndex as the preferred 
commitIndex
 ** Best effort: No need for client to contact all 3 OMs at once
 *** Broadcast might be related to `RequestHedgingProxyProvider`
 *** [https://grpc.io/docs/guides/request-hedging/]
 * Configuration to check the follower is way behind
 ** Failover to the next follower / leader 
 ** Ratis follower can also set a threshold to keep track to the lag to the 
leader
 *** After the threshold is hit, the follower will stop serving read
 * Think about Third-party Communications (3PC)
 ** If one client modifies the namespace and passes that knowledge to other 
clients, the latter ones should be able to read from Observer the same or a 
later state of the namespace, but not an earlier state.
 ** I think this is not needed since the consistency is linearizable
 * Compatibility
 ** If client tries to read from OM follower that does not enable the 
linearizable read, it should throw `NotLeaderException`
 * Topology awareness
 ** We can instead query from the nearest follower instead

> OM HA: support read from followers
> ----------------------------------
>
>                 Key: HDDS-9279
>                 URL: https://issues.apache.org/jira/browse/HDDS-9279
>             Project: Apache Ozone
>          Issue Type: Improvement
>          Components: OM HA
>            Reporter: Tsz-wo Sze
>            Assignee: Tsz-wo Sze
>            Priority: Major
>              Labels: pull-request-available
>
> Ratis has a new Linearizable Read (RATIS-1557) feature, including reading 
> from the followers. In this JIRA, we will change OM to serve read requests 
> from any OM servers, including the follower OMs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to