Hi Arun, As I remember, the fix in HDFS-3562 was pretty straightforward. It just retry ZK ops on connection loss/disconnect. When we want to make that utility as generic, Yes, as you said we need to cover some special cases like you mentioned. I think in such scenarios, that may be one way to reconstruct the state, but at this point I am not so strong on that, need more thinking. Also some other scenarios to consider is, what if zk succeed on some op at server side and connection loss, here simply retry will end up node already exist kind of issues right? So we may need to identify which ops are correct to simply retry etc. Rakesh do you better thoughts on this scenarios, considering ZK connection loss etc?
Regards, Uma On Wed, Jun 8, 2016 at 12:46 AM, Arun M. Krishnakumar <arunm...@gmail.com> wrote: > Thanks for the pointer, Uma Gangumalla. > > Could you please give an overview of the fix in HDFS-3562. > > In the case of Bookkeeper-client, the ReadOnlyLedgerHandle constructs a > watcher on the relevant Zookeeper nodes. The interesting things are the > watches created by the ReadOnlyLedgerHandle on the relevant zookeeper > nodes. We would lose the notifications that happen during the timeout. What > would be the best way to proceed in such scenarios ? Should we reconstruct > the state ? Is there any other such state that needs to be considered ? > > Thanks, > Arun > > On Tue, Jun 7, 2016 at 3:40 PM, Uma gangumalla <umamah...@apache.org> > wrote: > > > Good point, Venkateswara Rao. > > > > Some time ago, we worked on this scenarios. Here is a patch > > available. HDFS-3562 > > Here we just tried to keep at application side. But as a long term > solution > > this could be placed at BK side as utility module? So that all > applications > > can benefit. > > > > > > Note: As I remember RetryableZookeeper idea was taken from HBase. > > > > Regards, > > Uma > > > > On Mon, Jun 6, 2016 at 9:42 AM, Venkateswara Rao Jujjuri < > > jujj...@gmail.com> > > wrote: > > > > > If a bookie looses connection with ZK, connection gets reestablished > and > > > life goes on. How are we handling it on the client case? Should we > retry > > at > > > library level? > > > or leave it up to the application? Any discussion/thoughts on this? > > > > > > -- > > > Jvrao > > > --- > > > First they ignore you, then they laugh at you, then they fight you, > then > > > you win. - Mahatma Gandhi > > > > > >