neils-dev commented on a change in pull request #3160:
URL: https://github.com/apache/ozone/pull/3160#discussion_r822229881



##########
File path: 
hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/ha/OMFailoverProxyProvider.java
##########
@@ -339,39 +341,27 @@ protected Text computeDelegationTokenService() {
 
   /**
    * Called whenever an error warrants failing over. It is determined by the
-   * retry policy.
-   *
-   * This is a dummy call from {@link RetryInvocationHandler}. The actual
-   * failover should be performed using either
-   * {@link OMFailoverProxyProvider#performFailoverIfRequired(String)} or
-   * {@link OMFailoverProxyProvider#performFailoverToNextProxy()}.
-   *
-   * In {@link OzoneManagerProtocolClientSideTranslatorPB}, we first
-   * manually failover and then call the RetryAction FAILOVER_AND_RETRY. This
-   * is done because we do not want to always failover to the next proxy. If we
-   * get a OMNotLeaderException with a suggested leader, then we want to
-   * failover to that OM proxy instead. Hence, we failover manually and the
-   * {@link FailoverProxyProvider#performFailover(Object)} call should not do
-   * failover again.
+   * retry policy. This method is supposed to called only once in a
+   * multithreaded environment. This where the fail over occurs.
    */

Review comment:
       Not important - cosmetic, above comment reads _fail over_ two separate 
words, instead of the action `failover`

##########
File path: 
hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/ha/OMFailoverProxyProvider.java
##########
@@ -261,9 +263,9 @@ public RetryAction shouldRetry(Exception exception, int 
retries,
             //  address of the suggested leader along with the nodeID.
             //  Failing over just based on nodeID is not very robust.
 
-            // OMFailoverProxyProvider#performFailover() is a dummy call and
-            // does not perform any failover. Failover manually to the next OM.
-            performFailoverToNextProxy();
+            // Prepare the next OM to be tried. This will help with calculation
+            // of the wait times needed get creating the retryAction.
+            selectNextOmProxy();

Review comment:
       With multiple threads, in the case when the `nextOmProxyIndex` is set by 
one thread calling `shouldRetry` and another thread executes `shouldRetry` 
prior to the `performFailover` (setting of the` currentOmNodeId`), the 
_nextOmProxyIndex_ could be incremented before the OmProxy is tried.  Can this 
be a problem?  Should we not check if the _nextOmProxyIndex_ == 
`currentOmProxyIndex` before trying the next proxy (calling 
`selectNextOmProxy`), some sort of synchronized check?
   If `nextOmProxyIndex` != `currentOmProxyIndex `(unsynchronized), then it 
implies another thread is in failover and has just executed `shouldRetry` prior 
to calling `performFailover`.  We need to try the current `nextOmProxyIndex` 
before trying other.

##########
File path: 
hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/ha/OMFailoverProxyProvider.java
##########
@@ -115,8 +115,10 @@ public OMFailoverProxyProvider(ConfigurationSource 
configuration,
     loadOMClientConfigs(conf, this.omServiceId);

Review comment:
       @kerneltime Thanks for doing this patch and bringing up this concurrency 
issue in our PR for HDDS-5544 support for OM HA.  Following up on the note to 
avoid using `shouldRetry` for failover and instead use `performFailover` as 
implemented here, I found the following comment useful from the scm failover 
provider code,
     _// performFailOver() updates the currentProxyOmNodeId
     // When 2 or more threads run in parallel, the
     // RetryInvocationHandler will check the expectedFailOverCount
     // and not execute performFailOver() for one of them. So the other 
thread(s)
     // shall not call performFailOver(), instead it will call getProxy()._
   
   It would be great to add a similar comment here in the code as well.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to