[ 
https://issues.apache.org/jira/browse/HDDS-14516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Andika updated HDDS-14516:
-------------------------------
    Description: 
>From TestOzoneShellHAWithFollowerRead, it is observed that when OM enables 
>linearizable read, the first OM read request from a unique client (e.g.
getServiceInfo() in RpcClient initialization) sent to the OM will have a lot 
higher latency (around 500ms) compared to the following OM requests (which only 
runs for <10ms) from the same client. If another client sends a request, this 
issue happens again for the first request of that client.
{code:java}
2026-01-27 13:41:29,696 [IPC Server handler 14 on default port 15041] INFO  
protocolPB.OzoneManagerProtocolServerSideTranslatorPB 
(OzoneManagerProtocolServerSideTranslatorPB.java:submitReadRequestToOM(302)) - 
Linearizable read submit request ServiceList on omNode-2 elapsed 492ms
2026-01-27 13:41:29,700 [IPC Server handler 12 on default port 15041] INFO  
protocolPB.OzoneManagerProtocolServerSideTranslatorPB 
(OzoneManagerProtocolServerSideTranslatorPB.java:submitReadRequestToOM(302)) - 
Linearizable read submit request InfoVolume on omNode-2 elapsed 2ms
2026-01-27 13:41:29,703 [IPC Server handler 10 on default port 15041] INFO  
protocolPB.OzoneManagerProtocolServerSideTranslatorPB 
(OzoneManagerProtocolServerSideTranslatorPB.java:submitReadRequestToOM(302)) - 
Linearizable read submit request InfoBucket on omNode-2 elapsed 1ms  {code}
It does not seem to be related to the getServiceInfo() as I tried to remove the 
initial getServiceInfo() and the InfoVolume becomes the slow one instead. It 
also does not seem to be related to the ReadIndex network slowness since the 
high latency happens only in a test.

We need to check the reason of this.

  was:
>From TestOzoneShellHAWithFollowerRead, it is observed that when OM enables 
>linearizable read, the first OM read request from a unique client (e.g.
getServiceInfo() in RpcClient initialization) sent to the OM will have a lot 
higher latency (around 500ms) compared to the following OM requests (which only 
runs for <10ms) from the same client. If another client sends a request, this 
issue happens again for the first request of that client.
{code:java}
2026-01-27 13:41:29,696 [IPC Server handler 14 on default port 15041] INFO  
protocolPB.OzoneManagerProtocolServerSideTranslatorPB 
(OzoneManagerProtocolServerSideTranslatorPB.java:submitReadRequestToOM(302)) - 
Linearizable read submit request ServiceList on omNode-2 elapsed 492ms
2026-01-27 13:41:29,700 [IPC Server handler 12 on default port 15041] INFO  
protocolPB.OzoneManagerProtocolServerSideTranslatorPB 
(OzoneManagerProtocolServerSideTranslatorPB.java:submitReadRequestToOM(302)) - 
Linearizable read submit request InfoVolume on omNode-2 elapsed 2ms
2026-01-27 13:41:29,703 [IPC Server handler 10 on default port 15041] INFO  
protocolPB.OzoneManagerProtocolServerSideTranslatorPB 
(OzoneManagerProtocolServerSideTranslatorPB.java:submitReadRequestToOM(302)) - 
Linearizable read submit request InfoBucket on omNode-2 elapsed 1ms  {code}
It does not seem to be related to the getServiceInfo() as I tried to remove the 
initial getServiceInfo() and the InfoVolume becomes the slow one instead. It 
also does not seem to be related to the ReadIndex slowness since even if the 
appliedIndex remains unchanged and also using optimization such as 
[RATIS-2379|https://github.com/apache/ratis/pull/1332] and RATIS-2382, the 
issue still happens. Network slowness is also out of the question since the 
slowness happens in a test.

We need to check the reason of this.


> Investigate high latency on first OM linearizable read request
> --------------------------------------------------------------
>
>                 Key: HDDS-14516
>                 URL: https://issues.apache.org/jira/browse/HDDS-14516
>             Project: Apache Ozone
>          Issue Type: Sub-task
>            Reporter: Ivan Andika
>            Assignee: Ivan Andika
>            Priority: Major
>
> From TestOzoneShellHAWithFollowerRead, it is observed that when OM enables 
> linearizable read, the first OM read request from a unique client (e.g.
> getServiceInfo() in RpcClient initialization) sent to the OM will have a lot 
> higher latency (around 500ms) compared to the following OM requests (which 
> only runs for <10ms) from the same client. If another client sends a request, 
> this issue happens again for the first request of that client.
> {code:java}
> 2026-01-27 13:41:29,696 [IPC Server handler 14 on default port 15041] INFO  
> protocolPB.OzoneManagerProtocolServerSideTranslatorPB 
> (OzoneManagerProtocolServerSideTranslatorPB.java:submitReadRequestToOM(302)) 
> - Linearizable read submit request ServiceList on omNode-2 elapsed 492ms
> 2026-01-27 13:41:29,700 [IPC Server handler 12 on default port 15041] INFO  
> protocolPB.OzoneManagerProtocolServerSideTranslatorPB 
> (OzoneManagerProtocolServerSideTranslatorPB.java:submitReadRequestToOM(302)) 
> - Linearizable read submit request InfoVolume on omNode-2 elapsed 2ms
> 2026-01-27 13:41:29,703 [IPC Server handler 10 on default port 15041] INFO  
> protocolPB.OzoneManagerProtocolServerSideTranslatorPB 
> (OzoneManagerProtocolServerSideTranslatorPB.java:submitReadRequestToOM(302)) 
> - Linearizable read submit request InfoBucket on omNode-2 elapsed 1ms  {code}
> It does not seem to be related to the getServiceInfo() as I tried to remove 
> the initial getServiceInfo() and the InfoVolume becomes the slow one instead. 
> It also does not seem to be related to the ReadIndex network slowness since 
> the high latency happens only in a test.
> We need to check the reason of this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to