neils-dev commented on a change in pull request #3160:
URL: https://github.com/apache/ozone/pull/3160#discussion_r822229881
##########
File path:
hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/ha/OMFailoverProxyProvider.java
##########
@@ -339,39 +341,27 @@ protected Text computeDelegationTokenService() {
/**
* Called whenever an error warrants failing over. It is determined by the
- * retry policy.
- *
- * This is a dummy call from {@link RetryInvocationHandler}. The actual
- * failover should be performed using either
- * {@link OMFailoverProxyProvider#performFailoverIfRequired(String)} or
- * {@link OMFailoverProxyProvider#performFailoverToNextProxy()}.
- *
- * In {@link OzoneManagerProtocolClientSideTranslatorPB}, we first
- * manually failover and then call the RetryAction FAILOVER_AND_RETRY. This
- * is done because we do not want to always failover to the next proxy. If we
- * get a OMNotLeaderException with a suggested leader, then we want to
- * failover to that OM proxy instead. Hence, we failover manually and the
- * {@link FailoverProxyProvider#performFailover(Object)} call should not do
- * failover again.
+ * retry policy. This method is supposed to called only once in a
+ * multithreaded environment. This where the fail over occurs.
*/
Review comment:
Not important - cosmetic, above comment reads _fail over_ two separate
words, instead of the action `failover`
##########
File path:
hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/ha/OMFailoverProxyProvider.java
##########
@@ -261,9 +263,9 @@ public RetryAction shouldRetry(Exception exception, int
retries,
// address of the suggested leader along with the nodeID.
// Failing over just based on nodeID is not very robust.
- // OMFailoverProxyProvider#performFailover() is a dummy call and
- // does not perform any failover. Failover manually to the next OM.
- performFailoverToNextProxy();
+ // Prepare the next OM to be tried. This will help with calculation
+ // of the wait times needed get creating the retryAction.
+ selectNextOmProxy();
Review comment:
With multiple threads, in the case when the `nextOmProxyIndex` is set by
one thread calling `shouldRetry` and another thread executes `shouldRetry`
prior to the `performFailover` (setting of the` currentOmNodeId`), the
_nextOmProxyIndex_ could be incremented before the OmProxy is tried. Can this
be a problem? Should we not check if the _nextOmProxyIndex_ ==
`currentOmProxyIndex` before trying the next proxy (calling
`selectNextOmProxy`), some sort of synchronized check?
If `nextOmProxyIndex` != `currentOmProxyIndex `(unsynchronized), then it
implies another thread is in failover and has just executed `shouldRetry` prior
to calling `performFailover`. We need to try the current `nextOmProxyIndex`
before trying other.
##########
File path:
hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/ha/OMFailoverProxyProvider.java
##########
@@ -115,8 +115,10 @@ public OMFailoverProxyProvider(ConfigurationSource
configuration,
loadOMClientConfigs(conf, this.omServiceId);
Review comment:
@kerneltime Thanks for doing this patch and bringing up this concurrency
issue in our PR for HDDS-5544 support for OM HA. Following up on the note to
avoid using `shouldRetry` for failover and instead use `performFailover` as
implemented here, I found the following comment useful from the scm failover
provider code,
_// performFailOver() updates the currentProxyOmNodeId
// When 2 or more threads run in parallel, the
// RetryInvocationHandler will check the expectedFailOverCount
// and not execute performFailOver() for one of them. So the other
thread(s)
// shall not call performFailOver(), instead it will call getProxy()._
It would be great to add a similar comment here in the code as well.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]