keith-turner commented on issue #1004: FLUO-1000 OracleServer race conditions URL: https://github.com/apache/fluo/pull/1004#issuecomment-359848825 Looking at the latest travis output, I am still seeing some error messages like the following. ``` java.lang.IllegalStateException: instance must be started before calling this method at com.google.common.base.Preconditions.checkState(Preconditions.java:149) at org.apache.curator.framework.imps.CuratorFrameworkImpl.getData(CuratorFrameworkImpl.java:363) at org.apache.fluo.core.oracle.OracleServer.takeLeadership(OracleServer.java:426) at org.apache.curator.framework.recipes.leader.LeaderSelector$WrappedListener.takeLeadership(LeaderSelector.java:536) at org.apache.curator.framework.recipes.leader.LeaderSelector.doWork(LeaderSelector.java:399) at org.apache.curator.framework.recipes.leader.LeaderSelector.doWorkLoop(LeaderSelector.java:443) at org.apache.curator.framework.recipes.leader.LeaderSelector.access$100(LeaderSelector.java:64) at org.apache.curator.framework.recipes.leader.LeaderSelector$2.call(LeaderSelector.java:245) at org.apache.curator.framework.recipes.leader.LeaderSelector$2.call(LeaderSelector.java:239) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) ``` I suspect this is happening because the CuratorFramework was stopped, however I am not sure. I opened [CURATOR-448](https://issues.apache.org/jira/browse/CURATOR-448) because looking into this I found the error message confusing. The error message leads one to believe that curator was not started yet, however I think you could see the error message when it was stopped. Looking at Fluo's code it closes the leaderSelector before closing the curatorFramework. I looked at the implementation of the leaderSelector close method and it does not wait for thread it created to terminate. So its possible that when leaderSelector is closed and then the curatorFramework is closed that the thread created by the leaderSelector is till running. It would be good to verify that the state is STOPPED when we see this error message. If it is I think one possible approach is to do something like the following in the takeLeadership method. However I am not sure how to have strong check to ensure the exception came from curator because of the wrong state. ```java @Override public void takeLeadership(CuratorFramework curatorFramework) throws Exception { try { } catch (IllegalStateException e) { //TODO how can we verify this exception came from Curator???? Don't want to suppress other illegal state exceptions. if(curatorFramework.getState() == STOPPED) { log.debug(...); log a debug message that this happened } else { throw e; } } finally { isLeader = false; if (started) { // if we stopped the server manually, we shouldn't halt Halt.halt("Oracle has lost leadership unexpectedly and is now halting."); } } } ```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
