[ https://issues.apache.org/jira/browse/GEODE-7591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17126939#comment-17126939 ]
ASF GitHub Bot commented on GEODE-7591: --------------------------------------- kirklund commented on a change in pull request #5182: URL: https://github.com/apache/geode/pull/5182#discussion_r436033017 ########## File path: geode-core/src/distributedTest/java/org/apache/geode/distributed/internal/ClusterDistributionManagerDUnitTest.java ########## @@ -370,6 +371,33 @@ public void testWaitForViewInstallation() { .untilAsserted(() -> assertThat(waitForViewInstallationDone.get()).isTrue()); } + /** + * show that waitForViewInstallation works as expected when distribution manager is closed + * while waiting for the latest membership view to install + */ + @Test + public void testWaitForViewInstallationDisconnectDS() { + InternalDistributedSystem system = getSystem(); + ClusterDistributionManager dm = (ClusterDistributionManager) system.getDM(); + MembershipView<InternalDistributedMember> view = dm.getDistribution().getView(); + + AtomicBoolean waitForViewInstallationDone = new AtomicBoolean(); + executorService.submit(() -> { + try { + dm.waitForViewInstallation(view.getViewId() + 1); + waitForViewInstallationDone.set(true); + } catch (InterruptedException e) { + errorCollector.addError(e); + } + }); + + await().timeout(2000, TimeUnit.MILLISECONDS); Review comment: I don't think await() without an until of some sort does anything. It's the until clauses that start the waiting. I recommend using a CyclicBarrier to sync up the actions of the two threads and a Future to wait for completion: ``` CyclicBarrier cyclicBarrier = new CyclicBarrier(2); Future<Void> future = executorService.submit(() -> { try { cyclicBarrier.await(getTimeout().toMillis(), MILLISECONDS); dm.waitForViewInstallation(view.getViewId() + 1); } catch (InterruptedException e) { errorCollector.addError(e); } }); cyclicBarrier.await(getTimeout().toMillis(), MILLISECONDS); system.disconnect(); future.getTimeout().toMillis(), MILLISECONDS); ``` There's still no guarantee that the executor thread will actually be deep inside dm.waitForViewInstallation when the main thread calls disconnect, but there will be some percentage of CI runs that are hitting this scenario as desired. So as long as the test remains passing 100% of the time, we're confident enough that the scenario is tested and passing. To really guarantee the exact combination of two threads being in specific places would require some crazy test hooks down in dm and/or system which would probably require some unnatural acts of coding (not recommended). Another approach is to write an IntegrationTest or UnitTest in which some of the classes involved are mocks or spies but even then it's nice to have the dunit coverage as well. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > potential hang > -------------- > > Key: GEODE-7591 > URL: https://issues.apache.org/jira/browse/GEODE-7591 > Project: Geode > Issue Type: Improvement > Components: membership > Reporter: Bruce J Schuchardt > Assignee: Jakov Varenina > Priority: Major > > This method in ClusterDistributionManager waits for a new membership view to > be installed, but if the cache is being closed while waiting the method could > hang because it only checks for cache closure if the object it's waiting on > is notified. We should change the wait() to have a timeout so that the > `stopper` is polled periodically > {code:java} > void waitForViewInstallation(long id) throws InterruptedException { > if (id <= membershipViewIdAcknowledged) { > return; > } > synchronized (membershipViewIdGuard) { > while (membershipViewIdAcknowledged < id && > !stopper.isCancelInProgress()) { > if (logger.isDebugEnabled()) { > logger.debug("waiting for view {}. Current DM view processed by all > listeners is {}", id, > membershipViewIdAcknowledged); > } > membershipViewIdGuard.wait(); > } > } > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)