[ 
https://issues.apache.org/jira/browse/CURATOR-595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17330233#comment-17330233
 ] 

Francesco Nigro edited comment on CURATOR-595 at 4/23/21, 9:03 AM:
-------------------------------------------------------------------

{quote}There's nothing Curator can do it ZK doesn't delete the ephemeral node.
{quote}
What i see that's weird is that if I replace STOP with SUSPEND everything works 
fine, maybe curator is sending to ZK a request to delete the lease..?
{quote}Are you seeing this in Production? This may not really be an issue. Be 
aware that Curator's testing servers are not totally accurate to a production 
environment. Also, this test is with a single server. Clusters behave 
differently.
{quote}
It happens on unit tests that are used to validate our releases...I can try 
with clustered test server and test containers ZK and see if it works any 
better.


was (Author: nigrofranz):
{quote}There's nothing Curator can do it ZK doesn't delete the ephemeral node.
{quote}
What i see that's weird is that if I replace STOP with SUSPEND everything works 
fine(!), maybe curator is sending to ZK a request to delete the lease..?
{quote}Are you seeing this in Production? This may not really be an issue. Be 
aware that Curator's testing servers are not totally accurate to a production 
environment. Also, this test is with a single server. Clusters behave 
differently.
{quote}
It happens on unit tests that are used to validate our releases...I can try 
with clustered test server and test containers ZK and see if it works any 
better.

> InterProcessSemaphoreV2 LOST isn't releasing permits for other clients
> ----------------------------------------------------------------------
>
>                 Key: CURATOR-595
>                 URL: https://issues.apache.org/jira/browse/CURATOR-595
>             Project: Apache Curator
>          Issue Type: Bug
>          Components: Recipes
>    Affects Versions: 5.1.0
>            Reporter: Francesco Nigro
>            Assignee: Jordan Zimmerman
>            Priority: Major
>
> I'm not sure this is the right place to raise this, but I've added this test 
> on TestInterProcessSemaphore:
> {code:java}
>     @Test
>     public void testAcquireAfterLostServerOnRestart() throws Exception {
>         final int sessionTimout = 4000;
>         final int connectionTimout = 2000;
>         try (CuratorFramework client = 
> CuratorFrameworkFactory.newClient(server.getConnectString(), sessionTimout, 
> connectionTimout, new RetryNTimes(0, 1))) {
>             client.start();
>             client.blockUntilConnected();
>             final InterProcessSemaphoreV2 semaphore = new 
> InterProcessSemaphoreV2(client, "/1", 1);
>             assertNotNull(semaphore.acquire());
>             CountDownLatch lost = new CountDownLatch(1);
>             client.getConnectionStateListenable().addListener((client1, 
> newState) -> {
>                 if (newState == ConnectionState.LOST) {
>                     lost.countDown();
>                 }
>             });
>             server.stop();
>             lost.await();
>         }
>         server.restart();
>         try (CuratorFramework client = 
> CuratorFrameworkFactory.newClient(server.getConnectString(), sessionTimout, 
> connectionTimout, new RetryNTimes(0, 1))) {
>             client.start();
>             client.blockUntilConnected();
>             final InterProcessSemaphoreV2 semaphore = new 
> InterProcessSemaphoreV2(client, "/1", 1);
>             final int serverTick = ZooKeeperServer.DEFAULT_TICK_TIME;
>             Thread.sleep(sessionTimout + serverTick);
>             assertNotNull(semaphore.acquire(0, TimeUnit.SECONDS));
>         }
>     }
> {code}
> And this is not passing: the doc of InterProcessSemaphoreV2 state that 
> bq. "However, if the client session drops (crash, etc.), any leases held by 
> the client are automatically closed and made available to other clients." 
> maybe I'm missing something obvious on the ZK server config instead.
> Just checked out that by running on separated processes the same test:
> # start server on process A
> # start lease acquire on process B, listening for LOST events before suicide
> # restart server on Process A cause process B to suicide (as expected)
> # start lease acquire on process C, now succeed
> It seems that there is something going on in the intra-process case that's 
> not working as expected (to me, at least).
> NOTE: as written in newer comments, raising the timeout doesn't seems to work 
> too and different boxes are getting different outcomes (making this an 
> intermittent failure).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to