[
https://issues.apache.org/jira/browse/CURATOR-595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330212#comment-17330212
]
Jordan Zimmerman commented on CURATOR-595:
------------------------------------------
Are you seeing this in Production? This may not really be an issue. Be aware
that Curator's testing servers are not totally accurate to a production
environment. Also, this test is with a single server. Clusters behave
differently.
> InterProcessSemaphoreV2 LOST isn't releasing permits for other clients
> ----------------------------------------------------------------------
>
> Key: CURATOR-595
> URL: https://issues.apache.org/jira/browse/CURATOR-595
> Project: Apache Curator
> Issue Type: Bug
> Components: Recipes
> Affects Versions: 5.1.0
> Reporter: Francesco Nigro
> Assignee: Jordan Zimmerman
> Priority: Major
>
> I'm not sure this is the right place to raise this, but I've added this test
> on TestInterProcessSemaphore:
> {code:java}
> @Test
> public void testAcquireAfterLostServerOnRestart() throws Exception {
> final int sessionTimout = 4000;
> final int connectionTimout = 2000;
> try (CuratorFramework client =
> CuratorFrameworkFactory.newClient(server.getConnectString(), sessionTimout,
> connectionTimout, new RetryNTimes(0, 1))) {
> client.start();
> client.blockUntilConnected();
> final InterProcessSemaphoreV2 semaphore = new
> InterProcessSemaphoreV2(client, "/1", 1);
> assertNotNull(semaphore.acquire());
> CountDownLatch lost = new CountDownLatch(1);
> client.getConnectionStateListenable().addListener((client1,
> newState) -> {
> if (newState == ConnectionState.LOST) {
> lost.countDown();
> }
> });
> server.stop();
> lost.await();
> }
> server.restart();
> try (CuratorFramework client =
> CuratorFrameworkFactory.newClient(server.getConnectString(), sessionTimout,
> connectionTimout, new RetryNTimes(0, 1))) {
> client.start();
> client.blockUntilConnected();
> final InterProcessSemaphoreV2 semaphore = new
> InterProcessSemaphoreV2(client, "/1", 1);
> final int serverTick = ZooKeeperServer.DEFAULT_TICK_TIME;
> Thread.sleep(sessionTimout + serverTick);
> assertNotNull(semaphore.acquire(0, TimeUnit.SECONDS));
> }
> }
> {code}
> And this is not passing: the doc of InterProcessSemaphoreV2 state that
> bq. "However, if the client session drops (crash, etc.), any leases held by
> the client are automatically closed and made available to other clients."
> maybe I'm missing something obvious on the ZK server config instead.
> Just checked out that by running on separated processes the same test:
> # start server on process A
> # start lease acquire on process B, listening for LOST events before suicide
> # restart server on Process A cause process B to suicide (as expected)
> # start lease acquire on process C, now succeed
> It seems that there is something going on in the intra-process case that's
> not working as expected (to me, at least).
> NOTE: as written in newer comments, raising the timeout doesn't seems to work
> too and different boxes are getting different outcomes (making this an
> intermittent failure).
--
This message was sent by Atlassian Jira
(v8.3.4#803005)