Mikhail Pryakhin created CURATOR-466:
----------------------------------------

             Summary: LeaderSelector gets in an inconsistent state when 
releasing resources.
                 Key: CURATOR-466
                 URL: https://issues.apache.org/jira/browse/CURATOR-466
             Project: Apache Curator
          Issue Type: Bug
          Components: Recipes
    Affects Versions: 4.0.1
            Reporter: Mikhail Pryakhin


I'm using the leader election recipe that works well until I encountered 
application shutdown.

here is my example:

 
{code:java}
CuratorFramework framework = CuratorFrameworkFactory.builder()
    .connectString("localhost:2181")
    .retryPolicy(new RetryOneTime(100))
    .build();

LeaderSelector leaderSelector = new LeaderSelector(
    framework,
    "/path",
    new LeaderSelectorListener() {
        volatile boolean stopped;
        @Override
        public void takeLeadership(CuratorFramework client) throws Exception {
            System.out.println("I'm a new leader!");
            try {
                while (!Thread.currentThread().isInterrupted() && !stopped) {
                    TimeUnit.SECONDS.sleep(1);
                }
            } finally {
                System.out.println("I'm not a leader anymore..");
            }
        }

        @Override
        public void stateChanged(CuratorFramework client, ConnectionState     
newState) {
            if (client.getConnectionStateErrorPolicy().isErrorState(newState)) {
                stopped = true;
            }
         }
  }
);

framework.start();
leaderSelector.start();

TimeUnit.SECONDS.sleep(5);

leaderSelector.close();   //(1)
framework.close();        //(2){code}
 

When I release resources by calling close method first on the LeaderSelector 
instance and then on the CurtorFramework instance (lines 1 and 2) I always get 
the following exception:

 
{code:java}
java.lang.IllegalStateException: instance must be started before calling this 
method
at 
org.apache.curator.shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:444)
 ~[curator-client-4.0.1.jar:?]
at 
org.apache.curator.framework.imps.CuratorFrameworkImpl.delete(CuratorFrameworkImpl.java:424)
 ~[curator-framework-4.0.1.jar:4.0.1]
at 
org.apache.curator.framework.recipes.locks.LockInternals.deleteOurPath(LockInternals.java:347)
 ~[curator-recipes-4.0.1.jar:4.0.1]
at 
org.apache.curator.framework.recipes.locks.LockInternals.releaseLock(LockInternals.java:124)
 ~[curator-recipes-4.0.1.jar:4.0.1]
at 
org.apache.curator.framework.recipes.locks.InterProcessMutex.release(InterProcessMutex.java:154)
 ~[curator-recipes-4.0.1.jar:4.0.1]
at 
org.apache.curator.framework.recipes.leader.LeaderSelector.doWork(LeaderSelector.java:449)
 [curator-recipes-4.0.1.jar:4.0.1]
at 
org.apache.curator.framework.recipes.leader.LeaderSelector.doWorkLoop(LeaderSelector.java:466)
 [curator-recipes-4.0.1.jar:4.0.1]
at 
org.apache.curator.framework.recipes.leader.LeaderSelector.access$100(LeaderSelector.java:65)
 [curator-recipes-4.0.1.jar:4.0.1]
at 
org.apache.curator.framework.recipes.leader.LeaderSelector$2.call(LeaderSelector.java:246)
 [curator-recipes-4.0.1.jar:4.0.1]
at 
org.apache.curator.framework.recipes.leader.LeaderSelector$2.call(LeaderSelector.java:240)
 [curator-recipes-4.0.1.jar:4.0.1]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_141]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
[?:1.8.0_141]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_141]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[?:1.8.0_141]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[?:1.8.0_141]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_141]
{code}
 

The reason for the exception is that the non-blocking LeaderSelector.close 
method delegates call to the internal executor service, which abruptly cancels 
the running futures with the interptIfRunning flag set to true. Right after 
this, the CuratorFramework close methods are called. By the meantime, the 
future being canceled executes the finally block where it calls methods on the 
already closed CuratorFramework instance which leads to throwing an exception.

I thought I can wait a bit until the LeaderSelector instance is closed, so I 
tried to delay for some time before closing the CuratorFramework instance, but 
doing so leads to another exception:
{code:java}
ava.lang.InterruptedException: null
at java.lang.Object.wait(Native Method) ~[?:1.8.0_141]
at java.lang.Object.wait(Object.java:502) ~[?:1.8.0_141]
at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1409) 
~[zookeeper-3.4.12.jar:3.4.12--1]
at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:874) 
~[zookeeper-3.4.12.jar:3.4.12--1]
at 
org.apache.curator.framework.imps.DeleteBuilderImpl$5.call(DeleteBuilderImpl.java:274)
 ~[curator-framework-4.0.1.jar:4.0.1]
at 
org.apache.curator.framework.imps.DeleteBuilderImpl$5.call(DeleteBuilderImpl.java:268)
 ~[curator-framework-4.0.1.jar:4.0.1]
at 
org.apache.curator.connection.StandardConnectionHandlingPolicy.callWithRetry(StandardConnectionHandlingPolicy.java:64)
 ~[curator-client-4.0.1.jar:?]
at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:100) 
~[curator-client-4.0.1.jar:?]
at 
org.apache.curator.framework.imps.DeleteBuilderImpl.pathInForeground(DeleteBuilderImpl.java:265)
 ~[curator-framework-4.0.1.jar:4.0.1]
at 
org.apache.curator.framework.imps.DeleteBuilderImpl.forPath(DeleteBuilderImpl.java:249)
 ~[curator-framework-4.0.1.jar:4.0.1]
at 
org.apache.curator.framework.imps.DeleteBuilderImpl.forPath(DeleteBuilderImpl.java:34)
 ~[curator-framework-4.0.1.jar:4.0.1]
at 
org.apache.curator.framework.recipes.locks.LockInternals.deleteOurPath(LockInternals.java:347)
 ~[curator-recipes-4.0.1.jar:4.0.1]
at 
org.apache.curator.framework.recipes.locks.LockInternals.releaseLock(LockInternals.java:124)
 ~[curator-recipes-4.0.1.jar:4.0.1]
at 
org.apache.curator.framework.recipes.locks.InterProcessMutex.release(InterProcessMutex.java:154)
 ~[curator-recipes-4.0.1.jar:4.0.1]
at 
org.apache.curator.framework.recipes.leader.LeaderSelector.doWork(LeaderSelector.java:449)
 [curator-recipes-4.0.1.jar:4.0.1]
at 
org.apache.curator.framework.recipes.leader.LeaderSelector.doWorkLoop(LeaderSelector.java:466)
 [curator-recipes-4.0.1.jar:4.0.1]
at 
org.apache.curator.framework.recipes.leader.LeaderSelector.access$100(LeaderSelector.java:65)
 [curator-recipes-4.0.1.jar:4.0.1]
at 
org.apache.curator.framework.recipes.leader.LeaderSelector$2.call(LeaderSelector.java:246)
 [curator-recipes-4.0.1.jar:4.0.1]
at 
org.apache.curator.framework.recipes.leader.LeaderSelector$2.call(LeaderSelector.java:240)
 [curator-recipes-4.0.1.jar:4.0.1]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_141]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
[?:1.8.0_141]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_141]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
[?:1.8.0_141]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
[?:1.8.0_141]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_141]
{code}
At this time the exception is caused by the future being canceled with the 
interptIfRunning flag set to true in the LeaderSelector close method.

As the LeaderSelector implementation is based on the InterPorcessMutex that 
works with ephemeral nodes, do we really need to manually clean up on shutdown? 
As far as I know, the ephemeral nodes are deleted when a client disconnects.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to