[ 
https://issues.apache.org/jira/browse/BOOKKEEPER-424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13475868#comment-13475868
 ] 

Rakesh R commented on BOOKKEEPER-424:
-------------------------------------

Thanks Uma for your time and reviews. Could you please give your opinion on the 
following.

{quote}There will be a small race here, which may resulted to miss the events.
Race is between event and reregistration. Original newConnectedZK method events 
from ZKutil will not handle any expire events. So, just before reregistering if 
it gets expired event, we may miss that event handling.
 later operations may thow conn loss exception and may miss proper 
cleanup.{quote}
Actually the window gap between Syncconnected event and Expired event is the 
sessiontimeout configured for the client. I couldn't see any race between 
Syncconnected event and expiry event. I agree there could be a high chance of 
sending Disconnected event to the previous watcher in ZKUtil, but Bookie 
doesn't have any event handling other than just logging.

Consider in corner case where the client is very busy(in GCing or other cases) 
and got expired before registering new watcher. In this case anyway down the 
layer, after instantiation, bookie is using zkclient for checking env(as shown 
below code) here it would get ZKConnectionLossException and consequently would 
get shutdown.
{code}
        this.zk = instantiateZookeeperClient(conf);
        checkEnvironment(this.zk);
{code}

So IMHO, we can continue without having anymore extra checks. Whats your 
opinion?
                
> Bookie start is failing intermittently when zkclient connection delays
> ----------------------------------------------------------------------
>
>                 Key: BOOKKEEPER-424
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-424
>             Project: Bookkeeper
>          Issue Type: Bug
>          Components: bookkeeper-server
>    Affects Versions: 4.0.0, 4.1.0
>            Reporter: Rakesh R
>            Assignee: Rakesh R
>             Fix For: 4.2.0
>
>         Attachments: BOOKKEEPER-424-1.patch, BOOKKEEPER-424.patch
>
>
> I'm seeing the following intermittent failure, when there is a delay in 
> establishing zkclient connection with zkserver. 
> {code}
> org.apache.bookkeeper.bookie.BookieException$InvalidCookieException: 
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
> = ConnectionLoss for /ledgers/INSTANCEID
>       at org.apache.bookkeeper.bookie.Bookie.checkEnvironment(Bookie.java:329)
>       at org.apache.bookkeeper.bookie.Bookie.<init>(Bookie.java:378)
>       at 
> org.apache.bookkeeper.bookie.BookieInitializationTest.testStartBookieWithoutZKServer(BookieInitializationTest.java:253)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
>       at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>       at java.lang.reflect.Method.invoke(Unknown Source)
>       at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
>       at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
>       at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
>       at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
>       at 
> org.junit.internal.runners.statements.FailOnTimeout$1.run(FailOnTimeout.java:28)
> Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: 
> KeeperErrorCode = ConnectionLoss for /ledgers/INSTANCEID
>       at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>       at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>       at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1131)
>       at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1160)
>       at org.apache.bookkeeper.bookie.Bookie.getInstanceId(Bookie.java:346)
>       at org.apache.bookkeeper.bookie.Bookie.checkEnvironment(Bookie.java:280)
>       ... 11 more
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to