echobravopapa commented on a change in pull request #5839:
URL: https://github.com/apache/geode/pull/5839#discussion_r544649359
##########
File path:
geode-membership/src/main/java/org/apache/geode/distributed/internal/membership/gms/membership/GMSJoinLeave.java
##########
@@ -428,6 +432,24 @@ public boolean join() throws MemberStartupException {
}
}
+ boolean hasCoordinatorJoinedCluster(int viewId, long retrySleep)
Review comment:
looks like this extraction was test driven
##########
File path:
geode-membership/src/main/java/org/apache/geode/distributed/internal/membership/gms/membership/GMSJoinLeave.java
##########
@@ -383,40 +383,44 @@ public boolean join() throws MemberStartupException {
break;
}
}
- try {
- if (found && !state.hasContactedAJoinedLocator) {
- // if locators are restarting they may be handing out IDs from a
stale view that
- // we should go through quickly. Otherwise we should sleep a bit
to let failure
- // detection select a new coordinator
- if (state.possibleCoordinator.getVmViewId() < 0) {
- logger.debug("sleeping for {} before making another attempt to
find the coordinator",
- retrySleep);
- Thread.sleep(retrySleep);
- } else {
+ if (found && !state.hasContactedAJoinedLocator) {
+ try {
+ if
(hasCoordinatorJoinedCluster(state.possibleCoordinator.getVmViewId(),
retrySleep)) {
// since we were given a coordinator that couldn't be used we
should keep trying
tries = 0;
giveupTime = System.currentTimeMillis() + timeout;
}
+ } catch (InterruptedException e) {
+ Thread.currentThread().interrupt();
+ throw new MembershipConfigurationException(
+ "Retry sleep interrupted. Giving up on joining the distributed
system.");
}
- } catch (InterruptedException e) {
- logger.debug("retry sleep interrupted - giving up on joining the
distributed system");
- return false;
}
} // for
if (!this.isJoined) {
logger.debug("giving up attempting to join the distributed system
after "
+ (System.currentTimeMillis() - startTime) + "ms");
- }
- // to preserve old behavior we need to throw a MemberStartupException if
- // unable to contact any of the locators
- if (!this.isJoined && state.hasContactedAJoinedLocator) {
- throw new MemberStartupException("Unable to join the distributed
system in "
- + (System.currentTimeMillis() - startTime) + "ms");
- }
+ // to preserve old behavior we need to throw a MemberStartupException
if
+ // unable to contact any of the locators
+ if (state.hasContactedAJoinedLocator) {
+ throw new MemberStartupException("Unable to join the distributed
system in "
+ + (System.currentTimeMillis() - startTime) + "ms");
+ }
- return this.isJoined;
+ if (state.locatorsContacted == 0) {
+ throw new MembershipConfigurationException(
+ "Unable to join the distributed system. Could not contact any of
the locators: "
Review comment:
as I'm understanding this change, this section is the critical change to
improve upon error reporting... I don't see unit testing for the improvement;
there is a new test that looks for the above Exception string. Only it seems
that a test could validate the expanded error information has been sent as well.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]