ivandika3 commented on PR #9641:
URL: https://github.com/apache/ozone/pull/9641#issuecomment-3788494881
Thanks @greenwich for taking a look at this
> Add OMFollowerReadMetrics to track the follower read-specific metrics to
monitor its effectiveness, health issues, and measure performance.
We have client-side metrics (e.g. S3G metrics) to see the performance. Of
course, we can add more if needed.
> As you mentioned in the comments, using the simple round-robin routing
without checking node roles might not be ideal. Since OM already exposes
OmRoleInfo via ServiceInfo, we can leverage it.
Yes, this is the planned improvements. As you said, there are two possible
implementations (each with its own pros and cons)
1. We can periodically refresh OM roles in the background and cache the OM
roles
- Pros: The OM role refresh is not in the read critical path and will
not introduce latency increase
- Cons
- The cached OM roles can be stale depending on the background
service interval and the latency
- Adding background service might send unnecessary RPCs for idle
client (higher number of clients will generate a lot of these RPCs). Ideally,
we only need to send the RPCs when we actually need it.
- We also need to decide how do we check the OM service status: Do we
want to send `getServiceList` to the leader (which has a more complete view of
the Raft group) or we send `getRoleInfo` from each of the OM nodes (which might
have a more detailed information)
2. We can send an checkRole RPC per request or after every failover
- Pros: Most up-to-date data
- Cons
- Higher latency due to the additional RPC
- Higher number of RPCs (2x the number of read requests)
Due to this, we need to implement and benchmark to find the correct tradeoff
or better solution.
> Read after write consistency.
This should be guaranteed by the Ratis linearizable read using the Raft
ReadIndex protocol so client does not need to have a custom logic to handle
this. I have added
`TestOzoneManagerHAFollowerReadWithAllRunning#testLinearizableReadConsistency`
to test the consistency.
> Good to see the performance benchmarks about read throughput, latency, etc.
I'm working on this. Will share once the benchmarks result are out.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]