keith-turner commented on issue #1004: FLUO-1000 OracleServer race conditions
URL: https://github.com/apache/fluo/pull/1004#issuecomment-359848825
 
 
   Looking at the latest travis output, I am still seeing some error messages 
like the following.  
   
   ```
   java.lang.IllegalStateException: instance must be started before calling 
this method
        at 
com.google.common.base.Preconditions.checkState(Preconditions.java:149)
        at 
org.apache.curator.framework.imps.CuratorFrameworkImpl.getData(CuratorFrameworkImpl.java:363)
        at 
org.apache.fluo.core.oracle.OracleServer.takeLeadership(OracleServer.java:426)
        at 
org.apache.curator.framework.recipes.leader.LeaderSelector$WrappedListener.takeLeadership(LeaderSelector.java:536)
        at 
org.apache.curator.framework.recipes.leader.LeaderSelector.doWork(LeaderSelector.java:399)
        at 
org.apache.curator.framework.recipes.leader.LeaderSelector.doWorkLoop(LeaderSelector.java:443)
        at 
org.apache.curator.framework.recipes.leader.LeaderSelector.access$100(LeaderSelector.java:64)
        at 
org.apache.curator.framework.recipes.leader.LeaderSelector$2.call(LeaderSelector.java:245)
        at 
org.apache.curator.framework.recipes.leader.LeaderSelector$2.call(LeaderSelector.java:239)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
   ```
   
   I suspect this is happening because the CuratorFramework was stopped, 
however I am not sure.  I opened 
[CURATOR-448](https://issues.apache.org/jira/browse/CURATOR-448) because 
looking into this  I found the error message confusing.  The error message 
leads one to believe that curator was not started yet, however I think you 
could see the error message when it was stopped.
   
   Looking at Fluo's code it closes the leaderSelector before closing the 
curatorFramework.  I looked at the implementation of the leaderSelector close 
method and it does not wait for thread it created to terminate.  So its 
possible that when leaderSelector is closed and then the curatorFramework is 
closed that the thread created by the leaderSelector is till running.
   
   It would be good to verify that the state is STOPPED when we see this error 
message.  If it is I think one possible approach is to do something like the 
following in the takeLeadership method.  However I am not sure how to have  
strong check to ensure the exception came from curator because of the wrong 
state.
   
   ```java
     @Override
     public void takeLeadership(CuratorFramework curatorFramework) throws 
Exception {
   
       try {
       } catch (IllegalStateException e) {
         //TODO how can we verify this exception came from Curator????  Don't 
want to suppress other illegal state exceptions.
         if(curatorFramework.getState() == STOPPED) {
           log.debug(...);  log a debug message that this happened
         } else {
           throw e;
         }
       } finally {
         isLeader = false;
   
         if (started) {
           // if we stopped the server manually, we shouldn't halt
           Halt.halt("Oracle has lost leadership unexpectedly and is now 
halting.");
         }
       }
     }
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to