Arun, what did you observe? I think we already handle session expires and zookeeper connection recreation on ZooKeeperClient wrapper: https://github.com/apache/bookkeeper/blob/master/bookkeeper-server/src/main/java/org/apache/bookkeeper/zookeeper/ZooKeeperClient.java
We need to uncomment the code in Line 168. https://github.com/apache/bookkeeper/blob/master/bookkeeper-server/src/main/java/org/apache/bookkeeper/meta/AbstractZkLedgerManager.java#L168 (The change in twitter's branch does that retries: https://github.com/twitter/bookkeeper/blob/master/bookkeeper-server/src/main/java/org/apache/bookkeeper/meta/AbstractZkLedgerManager.java#L211 ) - Sijie On Wed, Jun 8, 2016 at 12:46 AM, Arun M. Krishnakumar <arunm...@gmail.com> wrote: > Thanks for the pointer, Uma Gangumalla. > > Could you please give an overview of the fix in HDFS-3562. > > In the case of Bookkeeper-client, the ReadOnlyLedgerHandle constructs a > watcher on the relevant Zookeeper nodes. The interesting things are the > watches created by the ReadOnlyLedgerHandle on the relevant zookeeper > nodes. We would lose the notifications that happen during the timeout. What > would be the best way to proceed in such scenarios ? Should we reconstruct > the state ? Is there any other such state that needs to be considered ? > > Thanks, > Arun > > On Tue, Jun 7, 2016 at 3:40 PM, Uma gangumalla <umamah...@apache.org> > wrote: > > > Good point, Venkateswara Rao. > > > > Some time ago, we worked on this scenarios. Here is a patch > > available. HDFS-3562 > > Here we just tried to keep at application side. But as a long term > solution > > this could be placed at BK side as utility module? So that all > applications > > can benefit. > > > > > > Note: As I remember RetryableZookeeper idea was taken from HBase. > > > > Regards, > > Uma > > > > On Mon, Jun 6, 2016 at 9:42 AM, Venkateswara Rao Jujjuri < > > jujj...@gmail.com> > > wrote: > > > > > If a bookie looses connection with ZK, connection gets reestablished > and > > > life goes on. How are we handling it on the client case? Should we > retry > > at > > > library level? > > > or leave it up to the application? Any discussion/thoughts on this? > > > > > > -- > > > Jvrao > > > --- > > > First they ignore you, then they laugh at you, then they fight you, > then > > > you win. - Mahatma Gandhi > > > > > >