[jira] [Updated] (HDDS-13933) Consistent Read from OM Followers

Ivan Andika (Jira) Fri, 14 Nov 2025 23:10:06 -0800


     [ 
https://issues.apache.org/jira/browse/HDDS-13933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ivan Andika updated HDDS-13933:
-------------------------------
    Description: 
HDDS-9279 introduces the Ratis / Raft based follower reads by giving an option 
to enable Ratis Linearizable Read and Leader Lease feature. However, based on 
previous performance tests, there are significant performance regressions both 
in the client and server. Currently, the root cause of this regression has not 
been identified for a long time.

Therefore, I suggest we pursue two concurrent approaches for consistent 
follower read. We can then enable the one with the better performance.
 * Improve the Ratis Lineariable Read and ReadIndex improvements (following up 
on HDDS-9279)
In my opinion, since we are stuck on improving this, we might try the second 
approach
 * We follow the HDFS observer read implementation (HDFS-12943)
 ** General flow
 *** Client send an msync to OM leader and OM leader reply with the last 
applied index as part of the AlignmentContext#getLastSeenId
 *** When client send a request to the OM follower, the Hadoop RPC mechanism 
will detect the AlignmentContext and will requeue the call if the 
Call#getClientStateId() > AlignmentContext.getLastSeenStateId() (see Hadoop 
Server.Handler#run)
 **** This implies that the server RPC handler will not get blocked, unlike 
linearizable read
 ** Pros
 *** This is similar to HDFS observer read implementation, so we know that this 
level of consistency is acceptable and we don't need a lot to prove that it is 
correct
 **** If we know that HDFS observer read has been deployed to production with 
no consistency issue and with acceptable performance so we expect the same on 
Ozone
 *** Possible better performance since Server will requeue the call to the RPC 
server
 ** Cons
 *** Only supports Hadoop RPC based client (gRPC based client need to implement 
their own way) 
 *** Will drift the implementation away from Raft which OM is based on
 ** Current Implementation plans
 *** Create one AlignmentContext for client and one AlignmentContext for OM
 *** Create a new proxy provider similar to ObserverReadProxyProvider to allow 
to route read requests to the OM followers

We hope that implementing one can uncover issues and improvements on the other.

  was:
HDDS-9279 introduces the Ratis / Raft based follower reads by giving an option 
to enable Ratis Linearizable Read and Leader Lease feature. However, based on 
previous performance tests, there are significant performance regressions both 
in the client and server. Currently, the root cause of this regression has not 
been identified for a long time.

Therefore, I suggest we pursue two concurrent approaches for consistent 
follower read. We can then enable the one with the better performance. 
Additionally, implementing one can uncover issues and improvements on the other
 * Improve the Ratis Lineariable Read and ReadIndex improvements (following up 
on HDDS-9279)
In my opinion, since we are stuck on improving this, we might try the second 
approach
 * We follow the HDFS observer read implementation (HDFS-12943)
 ** General flow
 *** Client send an msync to OM leader and OM leader reply with the last 
applied index as part of the AlignmentContext#getLastSeenId
 *** When client send a request to the OM follower, the Hadoop RPC mechanism 
will detect the AlignmentContext and will requeue the call if the 
Call#getClientStateId() > AlignmentContext.getLastSeenStateId() (see Hadoop 
Server.Handler#run)
 **** This implies that the server RPC handler will not get blocked, unlike 
linearizable read
 ** Pros
 *** This is similar to HDFS observer read implementation, so we know that this 
level of consistency is acceptable and we don't need a lot to prove that it is 
correct
 **** If we know that HDFS observer read has been deployed to production with 
no consistency issue and with acceptable performance so we expect the same on 
Ozone
 ** Cons
 *** Only supports Hadoop RPC based client (gRPC based client need to implement 
their own way) 
 *** Will drift the implementation away from Raft which OM is based on
 ** Current Implementation plans
 *** Create one AlignmentContext for client and one AlignmentContext for OM
 *** Create a new proxy provider similar to ObserverReadProxyProvider to allow 
to route read requests to the OM followers


> Consistent Read from OM Followers
> ---------------------------------
>
>                 Key: HDDS-13933
>                 URL: https://issues.apache.org/jira/browse/HDDS-13933
>             Project: Apache Ozone
>          Issue Type: New Feature
>            Reporter: Ivan Andika
>            Assignee: Ivan Andika
>            Priority: Major
>
> HDDS-9279 introduces the Ratis / Raft based follower reads by giving an 
> option to enable Ratis Linearizable Read and Leader Lease feature. However, 
> based on previous performance tests, there are significant performance 
> regressions both in the client and server. Currently, the root cause of this 
> regression has not been identified for a long time.
> Therefore, I suggest we pursue two concurrent approaches for consistent 
> follower read. We can then enable the one with the better performance.
>  * Improve the Ratis Lineariable Read and ReadIndex improvements (following 
> up on HDDS-9279)
> In my opinion, since we are stuck on improving this, we might try the second 
> approach
>  * We follow the HDFS observer read implementation (HDFS-12943)
>  ** General flow
>  *** Client send an msync to OM leader and OM leader reply with the last 
> applied index as part of the AlignmentContext#getLastSeenId
>  *** When client send a request to the OM follower, the Hadoop RPC mechanism 
> will detect the AlignmentContext and will requeue the call if the 
> Call#getClientStateId() > AlignmentContext.getLastSeenStateId() (see Hadoop 
> Server.Handler#run)
>  **** This implies that the server RPC handler will not get blocked, unlike 
> linearizable read
>  ** Pros
>  *** This is similar to HDFS observer read implementation, so we know that 
> this level of consistency is acceptable and we don't need a lot to prove that 
> it is correct
>  **** If we know that HDFS observer read has been deployed to production with 
> no consistency issue and with acceptable performance so we expect the same on 
> Ozone
>  *** Possible better performance since Server will requeue the call to the 
> RPC server
>  ** Cons
>  *** Only supports Hadoop RPC based client (gRPC based client need to 
> implement their own way) 
>  *** Will drift the implementation away from Raft which OM is based on
>  ** Current Implementation plans
>  *** Create one AlignmentContext for client and one AlignmentContext for OM
>  *** Create a new proxy provider similar to ObserverReadProxyProvider to 
> allow to route read requests to the OM followers
> We hope that implementing one can uncover issues and improvements on the 
> other.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (HDDS-13933) Consistent Read from OM Followers

Reply via email to