[
https://issues.apache.org/jira/browse/HDDS-14516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ivan Andika updated HDDS-14516:
-------------------------------
Description:
>From TestOzoneShellHAWithFollowerRead, it is observed that when OM enables
>linearizable read, the first OM read request from a unique client (e.g.
getServiceInfo() in RpcClient initialization) sent to the OM will have a lot
higher latency (around 500ms) compared to the following OM requests (which only
runs for <10ms) from the same client. If another client sends a request, this
issue happens again for the first request of that client.
{code:java}
2026-01-27 13:41:29,696 [IPC Server handler 14 on default port 15041] INFO
protocolPB.OzoneManagerProtocolServerSideTranslatorPB
(OzoneManagerProtocolServerSideTranslatorPB.java:submitReadRequestToOM(302)) -
Linearizable read submit request ServiceList on omNode-2 elapsed 492ms
2026-01-27 13:41:29,700 [IPC Server handler 12 on default port 15041] INFO
protocolPB.OzoneManagerProtocolServerSideTranslatorPB
(OzoneManagerProtocolServerSideTranslatorPB.java:submitReadRequestToOM(302)) -
Linearizable read submit request InfoVolume on omNode-2 elapsed 2ms
2026-01-27 13:41:29,703 [IPC Server handler 10 on default port 15041] INFO
protocolPB.OzoneManagerProtocolServerSideTranslatorPB
(OzoneManagerProtocolServerSideTranslatorPB.java:submitReadRequestToOM(302)) -
Linearizable read submit request InfoBucket on omNode-2 elapsed 1ms {code}
It does not seem to be related to the same request as I tried to remove the
initial getServiceInfo(). It also does not seem to be related to the ReadIndex
slowness since even if after using
[RATIS-2379|https://github.com/apache/ratis/pull/1332] and RATIS-2382, the
issue still happens.
We need to check the reason of this.
was:
>From TestOzoneShellHAWithFollowerRead, it is observed that when OM enables
>linearizable read, the first OM read request from a unique client (e.g.
getServiceInfo() in RpcClient initialization) sent to the OM will have a lot
higher latency (around 500ms) compared to the following OM requests (which only
runs for <10ms) from the same client. If another client sends a request, this
issue happens again for the first request of that client.
{code:java}
2026-01-27 13:41:29,696 [IPC Server handler 14 on default port 15041] INFO
protocolPB.OzoneManagerProtocolServerSideTranslatorPB
(OzoneManagerProtocolServerSideTranslatorPB.java:submitReadRequestToOM(302)) -
Linearizable read submit request ServiceList on omNode-2 elapsed 492ms
2026-01-27 13:41:29,700 [IPC Server handler 12 on default port 15041] INFO
protocolPB.OzoneManagerProtocolServerSideTranslatorPB
(OzoneManagerProtocolServerSideTranslatorPB.java:submitReadRequestToOM(302)) -
Linearizable read submit request InfoVolume on omNode-2 elapsed 2ms
2026-01-27 13:41:29,703 [IPC Server handler 10 on default port 15041] INFO
protocolPB.OzoneManagerProtocolServerSideTranslatorPB
(OzoneManagerProtocolServerSideTranslatorPB.java:submitReadRequestToOM(302)) -
Linearizable read submit request InfoBucket on omNode-2 elapsed 1ms {code}
It does not seem to be related to the same request as I tried to remove the
initial getServiceInfo().
We need to check the reason of this.
> Investigate high latency on first OM linearizable read request
> --------------------------------------------------------------
>
> Key: HDDS-14516
> URL: https://issues.apache.org/jira/browse/HDDS-14516
> Project: Apache Ozone
> Issue Type: Sub-task
> Reporter: Ivan Andika
> Assignee: Ivan Andika
> Priority: Major
>
> From TestOzoneShellHAWithFollowerRead, it is observed that when OM enables
> linearizable read, the first OM read request from a unique client (e.g.
> getServiceInfo() in RpcClient initialization) sent to the OM will have a lot
> higher latency (around 500ms) compared to the following OM requests (which
> only runs for <10ms) from the same client. If another client sends a request,
> this issue happens again for the first request of that client.
> {code:java}
> 2026-01-27 13:41:29,696 [IPC Server handler 14 on default port 15041] INFO
> protocolPB.OzoneManagerProtocolServerSideTranslatorPB
> (OzoneManagerProtocolServerSideTranslatorPB.java:submitReadRequestToOM(302))
> - Linearizable read submit request ServiceList on omNode-2 elapsed 492ms
> 2026-01-27 13:41:29,700 [IPC Server handler 12 on default port 15041] INFO
> protocolPB.OzoneManagerProtocolServerSideTranslatorPB
> (OzoneManagerProtocolServerSideTranslatorPB.java:submitReadRequestToOM(302))
> - Linearizable read submit request InfoVolume on omNode-2 elapsed 2ms
> 2026-01-27 13:41:29,703 [IPC Server handler 10 on default port 15041] INFO
> protocolPB.OzoneManagerProtocolServerSideTranslatorPB
> (OzoneManagerProtocolServerSideTranslatorPB.java:submitReadRequestToOM(302))
> - Linearizable read submit request InfoBucket on omNode-2 elapsed 1ms {code}
> It does not seem to be related to the same request as I tried to remove the
> initial getServiceInfo(). It also does not seem to be related to the
> ReadIndex slowness since even if after using
> [RATIS-2379|https://github.com/apache/ratis/pull/1332] and RATIS-2382, the
> issue still happens.
> We need to check the reason of this.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]