ivandika3 commented on PR #9641:
URL: https://github.com/apache/ozone/pull/9641#issuecomment-3788494881

   Thanks @greenwich for taking a look at this
   
   > Add OMFollowerReadMetrics to track the follower read-specific metrics to 
monitor its effectiveness, health issues, and measure performance.
   
   We have client-side metrics (e.g. S3G metrics) to see the performance. Of 
course, we can add more if needed.
   
   > As you mentioned in the comments, using the simple round-robin routing 
without checking node roles might not be ideal. Since OM already exposes 
OmRoleInfo via ServiceInfo, we can leverage it.
   
   Yes, this is the planned improvements. As you said, there are two possible 
implementations (each with its own pros and cons)
   1. We can periodically refresh OM roles in the background and cache the OM 
roles
       - Pros: The OM role refresh is not in the read critical path and will 
not introduce latency increase
       - Cons
          - The cached OM roles can be stale depending on the background 
service interval and the latency
          - Adding background service might send unnecessary RPCs for idle 
client (higher number of clients will generate a lot of these RPCs). Ideally, 
we only need to send the RPCs when we actually need it.
          - We also need to decide how do we check the OM service status: Do we 
want to send `getServiceList` to the leader (which has a more complete view of 
the Raft group) or we send `getRoleInfo` from each of the OM nodes (which might 
have a more detailed information)
   2. We can send an checkRole RPC per request or after every failover
       - Pros: Most up-to-date data
       - Cons
         - Higher latency due to the additional RPC
         - Higher number of RPCs (2x the number of read requests)
   
   Due to this, we need to implement and benchmark to find the correct tradeoff 
or better solution.
   
   > Read after write consistency. 
   
   This should be guaranteed by the Ratis linearizable read using the Raft 
ReadIndex protocol so client does not need to have a custom logic to handle 
this. I have added 
`TestOzoneManagerHAFollowerReadWithAllRunning#testLinearizableReadConsistency` 
to test the consistency.
   
   
   > Good to see the performance benchmarks about read throughput, latency, etc.
   
   I'm working on this. Will share once the benchmarks result are out.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to