[PR] HDDS-14426. OMResponse.leaderOMNodeId should not use RaftClientMessage.serverId [ozone]

via GitHub Wed, 28 Jan 2026 02:59:00 -0800


ivandika3 opened a new pull request, #9682:
URL: https://github.com/apache/ozone/pull/9682


   ## What changes were proposed in this pull request?
   
   Currently in OMRatisHelper#getOMResponseFromRaftClientReply, the 
OMResponse.leaderOMNodeId is set to the replierId (i.e. 
RaftClientReply.serverId) which is set to the Raft peer that is serving the 
request.
   
   Previously, it was fine since all Ratis request all write requests that need 
to be served by the leader. However, now that Ozone support OM follower read, 
OM cannot use replierId as the OMResponse.leaderOMNodeId.
   
   Hadoop27OmTransport and Hadoop3OmTransport submitRequest will failover to 
the this new OM node ID. If the request is incorrectly set to a non-leader OM, 
this might cause the subsequent request to wrongly be sent to the follower, 
which might add to the write latency.
   
   I also decided to remove the failover logic from the Hadoop transport 
because it causes the client to failover twice 
   1. The setNextOmProxy Hadoop3OMTransport#submitRequest will set the current 
proxy OM node ID
   2. However, selectNextOmProxy will be triggered in 
OmFailoverProxyProviderBase#getRetryPolicy
       - This will update the OM proxy from pointing to the leader to the next 
OM node
       - This will trigger long failover, especially since we need to wait 
longer client already goes through all OM nodes
   
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-14426
   
   ## How was this patch tested?
   
   IT (Clean CI: https://github.com/ivandika3/ozone/actions/runs/21431935770)
   
   Note: `testOMProxyProviderFailoverToCurrentLeader` is failing because of the 
removal of `setNextOmProxy` logic in Hadoop transport. I removed it because I 
believe the test is not correct.
   
   The test assumes that if we trigger 
`OmFailoverProxyProvider#performFailover`, the next OM request will be sent to 
the new current OM node ID since `FailoverProxyProvider#getProxy` is already 
updated to the new OM node ID. However, this is incorrect since OM client wraps 
the RPC proxy with `RetryProxy#create` which uses `RetryInvocationHandler` . 
`RetryInvocationHandler` caches the current proxy in 
`ProxyDescriptor.proxyInfo` and only call `FailoverProxyProvider#getProxy` when 
there is a failover event which triggers `ProxyDescriptor#failover`. So even if 
the test changes the OM node ID to non-leader, the `RetryInvocationHandler` 
will keep sending to the leader until the leader steps down / down / 
disconnected.
   
   We can technically make the test pass by removing the cached 
`ProxyDescriptor.proxyInfo` and make 
`RetryInvocationHandler.ProxyDescriptor#getProxy` calls 
`FailoverProxyProvider#getProxy` directly. However, this might affect other use 
Hadoop client use cases where `FailoverProxyProvider#getProxy` recreates the 
proxy every time.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] HDDS-14426. OMResponse.leaderOMNodeId should not use RaftClientMessage.serverId [ozone]

Reply via email to