Re: [jira] [Commented] (CURATOR-466) LeaderSelector gets in an inconsistent state when releasing resources.

kenneth mcfarland Tue, 13 Nov 2018 13:01:52 -0800

Ok I remembered, we switched to LeaderLatch and those errors vanished. Try
using that instead, it's an easy refactor.


You can look at the Fluo codebase if you want to see explicit changes.

Cheers,
Kenny



On Tue, Nov 13, 2018, 12:56 PM kenneth mcfarland <
[email protected] wrote:

> Your error messages look a lot like I have seen for about a year or more
> is it related to this below?
>
> https://issues.apache.org/jira/plugins/servlet/mobile#issue/CURATOR-468
>
> We stopped using it and switched to another leader election class because
> of the above issue, it was the only way to kill the spurious exceptions.
>
> When I sit down and can get finer detailed info ill tell you what
> selection method we used.
>
> Cheers!!
>
>
>
> On Tue, Nov 13, 2018, 12:18 PM Mikhail Pryakhin (JIRA) <[email protected]
> wrote:
>
>>
>>     [
>> https://issues.apache.org/jira/browse/CURATOR-466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16685697#comment-16685697
>> ]
>>
>> Mikhail Pryakhin commented on CURATOR-466:
>> ------------------------------------------
>>
>> [~randgalt] Thank you.
>>
>> Do I get you right that closing only a framework instance is a correct
>> way for a client to give up participation in a leader election process?
>>
>> > LeaderSelector gets in an inconsistent state when releasing resources.
>> > ----------------------------------------------------------------------
>> >
>> >                 Key: CURATOR-466
>> >                 URL: https://issues.apache.org/jira/browse/CURATOR-466
>> >             Project: Apache Curator
>> >          Issue Type: Bug
>> >          Components: Recipes
>> >    Affects Versions: 4.0.1
>> >            Reporter: Mikhail Pryakhin
>> >            Priority: Major
>> >
>> > I'm using the leader election recipe that works well until I
>> encountered application shutdown.
>> > here is my example:
>> >
>> > {code:java}
>> > CuratorFramework framework = CuratorFrameworkFactory.builder()
>> >     .connectString("localhost:2181")
>> >     .retryPolicy(new RetryOneTime(100))
>> >     .build();
>> > LeaderSelector leaderSelector = new LeaderSelector(
>> >     framework,
>> >     "/path",
>> >     new LeaderSelectorListener() {
>> >         volatile boolean stopped;
>> >         @Override
>> >         public void takeLeadership(CuratorFramework client) throws
>> Exception {
>> >             System.out.println("I'm a new leader!");
>> >             try {
>> >                 while (!Thread.currentThread().isInterrupted() &&
>> !stopped) {
>> >                     TimeUnit.SECONDS.sleep(1);
>> >                 }
>> >             } finally {
>> >                 System.out.println("I'm not a leader anymore..");
>> >             }
>> >         }
>> >         @Override
>> >         public void stateChanged(CuratorFramework client,
>> ConnectionState     newState) {
>> >             if
>> (client.getConnectionStateErrorPolicy().isErrorState(newState)) {
>> >                 stopped = true;
>> >             }
>> >          }
>> >   }
>> > );
>> > framework.start();
>> > leaderSelector.start();
>> > TimeUnit.SECONDS.sleep(5);
>> > leaderSelector.close();   //(1)
>> > framework.close();        //(2){code}
>> >
>> > When I release resources by calling close method first on the
>> LeaderSelector instance and then on the CurtorFramework instance (lines 1
>> and 2) I always get the following exception:
>> >
>> > {code:java}
>> > java.lang.IllegalStateException: instance must be started before
>> calling this method
>> > at 
>> > org.apache.curator.shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:444)
>> ~[curator-client-4.0.1.jar:?]
>> > at
>> org.apache.curator.framework.imps.CuratorFrameworkImpl.delete(CuratorFrameworkImpl.java:424)
>> ~[curator-framework-4.0.1.jar:4.0.1]
>> > at
>> org.apache.curator.framework.recipes.locks.LockInternals.deleteOurPath(LockInternals.java:347)
>> ~[curator-recipes-4.0.1.jar:4.0.1]
>> > at
>> org.apache.curator.framework.recipes.locks.LockInternals.releaseLock(LockInternals.java:124)
>> ~[curator-recipes-4.0.1.jar:4.0.1]
>> > at
>> org.apache.curator.framework.recipes.locks.InterProcessMutex.release(InterProcessMutex.java:154)
>> ~[curator-recipes-4.0.1.jar:4.0.1]
>> > at
>> org.apache.curator.framework.recipes.leader.LeaderSelector.doWork(LeaderSelector.java:449)
>> [curator-recipes-4.0.1.jar:4.0.1]
>> > at
>> org.apache.curator.framework.recipes.leader.LeaderSelector.doWorkLoop(LeaderSelector.java:466)
>> [curator-recipes-4.0.1.jar:4.0.1]
>> > at
>> org.apache.curator.framework.recipes.leader.LeaderSelector.access$100(LeaderSelector.java:65)
>> [curator-recipes-4.0.1.jar:4.0.1]
>> > at
>> org.apache.curator.framework.recipes.leader.LeaderSelector$2.call(LeaderSelector.java:246)
>> [curator-recipes-4.0.1.jar:4.0.1]
>> > at
>> org.apache.curator.framework.recipes.leader.LeaderSelector$2.call(LeaderSelector.java:240)
>> [curator-recipes-4.0.1.jar:4.0.1]
>> > at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>> [?:1.8.0_141]
>> > at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>> [?:1.8.0_141]
>> > at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>> [?:1.8.0_141]
>> > at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>> [?:1.8.0_141]
>> > at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>> [?:1.8.0_141]
>> > at java.lang.Thread.run(Thread.java:748) [?:1.8.0_141]
>> > {code}
>> >
>> > The reason for the exception is that the non-blocking
>> LeaderSelector.close method delegates call to the internal executor
>> service, which abruptly cancels the running futures with the
>> interptIfRunning flag set to true. Right after this, the CuratorFramework
>> close method is called. By the meantime, the future being canceled executes
>> the finally block where it calls methods on the already closed
>> CuratorFramework instance which leads to throwing an exception.
>> > I thought I can wait a bit until the LeaderSelector instance is closed,
>> so I tried to delay for some time before closing the CuratorFramework
>> instance, but doing so leads to another exception:
>> > {code:java}
>> > ava.lang.InterruptedException: null
>> > at java.lang.Object.wait(Native Method) ~[?:1.8.0_141]
>> > at java.lang.Object.wait(Object.java:502) ~[?:1.8.0_141]
>> > at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1409)
>> ~[zookeeper-3.4.12.jar:3.4.12--1]
>> > at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:874)
>> ~[zookeeper-3.4.12.jar:3.4.12--1]
>> > at
>> org.apache.curator.framework.imps.DeleteBuilderImpl$5.call(DeleteBuilderImpl.java:274)
>> ~[curator-framework-4.0.1.jar:4.0.1]
>> > at
>> org.apache.curator.framework.imps.DeleteBuilderImpl$5.call(DeleteBuilderImpl.java:268)
>> ~[curator-framework-4.0.1.jar:4.0.1]
>> > at
>> org.apache.curator.connection.StandardConnectionHandlingPolicy.callWithRetry(StandardConnectionHandlingPolicy.java:64)
>> ~[curator-client-4.0.1.jar:?]
>> > at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:100)
>> ~[curator-client-4.0.1.jar:?]
>> > at
>> org.apache.curator.framework.imps.DeleteBuilderImpl.pathInForeground(DeleteBuilderImpl.java:265)
>> ~[curator-framework-4.0.1.jar:4.0.1]
>> > at
>> org.apache.curator.framework.imps.DeleteBuilderImpl.forPath(DeleteBuilderImpl.java:249)
>> ~[curator-framework-4.0.1.jar:4.0.1]
>> > at
>> org.apache.curator.framework.imps.DeleteBuilderImpl.forPath(DeleteBuilderImpl.java:34)
>> ~[curator-framework-4.0.1.jar:4.0.1]
>> > at
>> org.apache.curator.framework.recipes.locks.LockInternals.deleteOurPath(LockInternals.java:347)
>> ~[curator-recipes-4.0.1.jar:4.0.1]
>> > at
>> org.apache.curator.framework.recipes.locks.LockInternals.releaseLock(LockInternals.java:124)
>> ~[curator-recipes-4.0.1.jar:4.0.1]
>> > at
>> org.apache.curator.framework.recipes.locks.InterProcessMutex.release(InterProcessMutex.java:154)
>> ~[curator-recipes-4.0.1.jar:4.0.1]
>> > at
>> org.apache.curator.framework.recipes.leader.LeaderSelector.doWork(LeaderSelector.java:449)
>> [curator-recipes-4.0.1.jar:4.0.1]
>> > at
>> org.apache.curator.framework.recipes.leader.LeaderSelector.doWorkLoop(LeaderSelector.java:466)
>> [curator-recipes-4.0.1.jar:4.0.1]
>> > at
>> org.apache.curator.framework.recipes.leader.LeaderSelector.access$100(LeaderSelector.java:65)
>> [curator-recipes-4.0.1.jar:4.0.1]
>> > at
>> org.apache.curator.framework.recipes.leader.LeaderSelector$2.call(LeaderSelector.java:246)
>> [curator-recipes-4.0.1.jar:4.0.1]
>> > at
>> org.apache.curator.framework.recipes.leader.LeaderSelector$2.call(LeaderSelector.java:240)
>> [curator-recipes-4.0.1.jar:4.0.1]
>> > at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>> [?:1.8.0_141]
>> > at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>> [?:1.8.0_141]
>> > at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>> [?:1.8.0_141]
>> > at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>> [?:1.8.0_141]
>> > at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>> [?:1.8.0_141]
>> > at java.lang.Thread.run(Thread.java:748) [?:1.8.0_141]
>> > {code}
>> > At this time the exception is caused by the future being canceled with
>> the interptIfRunning flag set to true in the LeaderSelector close method.
>> > As the LeaderSelector implementation is based on the InterPorcessMutex
>> that works with ephemeral nodes, do we really need to manually clean up on
>> shutdown? As far as I know, the ephemeral nodes are deleted when the client
>> disconnects.
>> >
>>
>>
>>
>> --
>> This message was sent by Atlassian JIRA
>> (v7.6.3#76005)
>>
>

Re: [jira] [Commented] (CURATOR-466) LeaderSelector gets in an inconsistent state when releasing resources.

Reply via email to