[
https://issues.apache.org/jira/browse/HDDS-5228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17346759#comment-17346759
]
Bharat Viswanadham commented on HDDS-5228:
------------------------------------------
Even if RpcClient is shared across threads, they all will have the same
FailoverProxyProvider. If the 1st thread fails over and discovers the leader
OM, all the subsequent requests from any thread) will be directed to the
correct OM. I do not see how the retry count will be exhausted because of
shared threads. Please let me know if I am missing something here.
The problem here is we update the currentProxyNodeId in RetryPolicy#shouldRetry.
So, lets say 2 threads both contacting OM1 and if OM1 is down.
T1 updates the proxy to OM2 and updates the proxy in proxyDescriptor.
T2 updates the proxy to OM3 and updates the proxy in proxyDescriptor.
So here if T1 and T2 are running in parallel, once after T1 updates, T2 should
not update it.
RetryInvocationhandler this case by comparing expected failOverCount and not
calling performFailOver, but our performFailOver is a no-op and
currentProxyNodeId is update in shouldRetry.
Recently we have fixed this for SCM, for more info refer to this.
https://github.com/apache/ozone/pull/2249#issue-645725169
> Make OM FailOverProxyProvider work across threads
> -------------------------------------------------
>
> Key: HDDS-5228
> URL: https://issues.apache.org/jira/browse/HDDS-5228
> Project: Apache Ozone
> Issue Type: Improvement
> Reporter: Bharat Viswanadham
> Assignee: Bharat Viswanadham
> Priority: Major
>
> Use perform failover for doing perform failover instead of updating proxy in
> RetryPolocy#shouldRetry.
> With this, if RpcClient shared across threads it will unnecessarily exhaust
> the retry count.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]