On Tuesday, June 14, 2016, Venkateswara Rao Jujjuri <jujj...@gmail.com>
wrote:

> So Sijie, what happens in the following scenario ?
>
> - AddEntry failed. Timeout or something.
> - Client received the error and tried to get another bookie from ZK.
> - Now ZK connection failed.
>
> Ideally it should do the following to avoid write errors:
>
> - Try to renew lease with the ZK, if succeeds , get a new Bookie, update
> ensemble, proceed with write.


> - If it fails to renew ZK session lease, reestablish a new session with ZK,
> go through the recovery process of updating watchers,
>   get the list of new bookies, update ensemble, send write to new bookie,
> and then send success to client?


The zookeeper wrapper does retries on session loss, expires. So if session
expired when ensemble changing, it would retry until succeed or exhausting
retries. You could check ZooKeeperClient in the util package.



>
> Does this happen with Twitter code?
>
> Also while updating watchers, does it handle transient error conditions?
> Like the middle of establishing watchers,
> some client process may miss watch notifications etc.


Yes. The retry ends at a successful getData to set watcher. So there is no
notification missed.

Sijie


>
>
> On Tue, Jun 14, 2016 at 3:21 PM, Sijie Guo <si...@apache.org
> <javascript:;>> wrote:
>
> > On Tue, Jun 14, 2016 at 2:30 PM, Arun M. Krishnakumar <
> arunm...@gmail.com <javascript:;>>
> > wrote:
> >
> > > Hi Sijie,
> > >
> > > I believe the ZooKeeperClient class handles the server connections and
> we
> > > haven't faced issues with that. Could you please confirm ?
> > >
> >
> > Yes. It handles session expires and recreates the connections.
> >
> >
> > >
> > > The issue was with the client connection in the AbstractZkLedgerManager
> > > class as you mentioned above. The twitter branch fix seems to recreate
> > the
> > > listeners and reestablish state. Could you please push it to the
> > community
> > > ?
> > >
> >
> > Yes. I will do.
> >
> >
> > >
> > > Thanks,
> > > Arun
> > >
> > > On Tue, Jun 14, 2016 at 2:17 PM, Sijie Guo <si...@apache.org
> <javascript:;>> wrote:
> > >
> > > > Arun, what did you observe?
> > > >
> > > > I think we already handle session expires and zookeeper connection
> > > > recreation on ZooKeeperClient wrapper:
> > > >
> > > >
> > >
> >
> https://github.com/apache/bookkeeper/blob/master/bookkeeper-server/src/main/java/org/apache/bookkeeper/zookeeper/ZooKeeperClient.java
> > > >
> > > >
> > > > We need to uncomment the code in Line 168.
> > > >
> > > >
> > > >
> > >
> >
> https://github.com/apache/bookkeeper/blob/master/bookkeeper-server/src/main/java/org/apache/bookkeeper/meta/AbstractZkLedgerManager.java#L168
> > > >
> > > > (The change in twitter's branch does that retries:
> > > >
> > > >
> > >
> >
> https://github.com/twitter/bookkeeper/blob/master/bookkeeper-server/src/main/java/org/apache/bookkeeper/meta/AbstractZkLedgerManager.java#L211
> > > > )
> > > >
> > > > - Sijie
> > > >
> > > >
> > > >
> > > > On Wed, Jun 8, 2016 at 12:46 AM, Arun M. Krishnakumar <
> > > arunm...@gmail.com <javascript:;>>
> > > > wrote:
> > > >
> > > > > Thanks for the pointer, Uma Gangumalla.
> > > > >
> > > > > Could you please give an overview of the fix in HDFS-3562.
> > > > >
> > > > > In the case of Bookkeeper-client, the ReadOnlyLedgerHandle
> > constructs a
> > > > > watcher on the relevant Zookeeper nodes. The interesting things are
> > the
> > > > > watches created by the ReadOnlyLedgerHandle on the relevant
> zookeeper
> > > > > nodes. We would lose the notifications that happen during the
> > timeout.
> > > > What
> > > > > would be the best way to proceed in such scenarios ? Should we
> > > > reconstruct
> > > > > the state ? Is there any other such state that needs to be
> > considered ?
> > > > >
> > > > > Thanks,
> > > > > Arun
> > > > >
> > > > > On Tue, Jun 7, 2016 at 3:40 PM, Uma gangumalla <
> umamah...@apache.org <javascript:;>
> > >
> > > > > wrote:
> > > > >
> > > > > > Good point, Venkateswara Rao.
> > > > > >
> > > > > > Some time ago, we worked on this scenarios. Here is a patch
> > > > > > available. HDFS-3562
> > > > > > Here we just tried to keep at application side. But as a long
> term
> > > > > solution
> > > > > > this could be placed at BK side as utility module? So that all
> > > > > applications
> > > > > > can benefit.
> > > > > >
> > > > > >
> > > > > > Note: As I remember RetryableZookeeper idea was taken from HBase.
> > > > > >
> > > > > > Regards,
> > > > > > Uma
> > > > > >
> > > > > > On Mon, Jun 6, 2016 at 9:42 AM, Venkateswara Rao Jujjuri <
> > > > > > jujj...@gmail.com <javascript:;>>
> > > > > > wrote:
> > > > > >
> > > > > > > If a bookie looses connection with ZK, connection gets
> > > reestablished
> > > > > and
> > > > > > > life goes on. How are we handling it on the client case? Should
> > we
> > > > > retry
> > > > > > at
> > > > > > > library level?
> > > > > > > or leave it up to the application? Any discussion/thoughts on
> > this?
> > > > > > >
> > > > > > > --
> > > > > > > Jvrao
> > > > > > > ---
> > > > > > > First they ignore you, then they laugh at you, then they fight
> > you,
> > > > > then
> > > > > > > you win. - Mahatma Gandhi
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
>
>
> --
> Jvrao
> ---
> First they ignore you, then they laugh at you, then they fight you, then
> you win. - Mahatma Gandhi
>

Reply via email to