On mar 5 set 2017, 21:28 Sijie Guo <guosi...@gmail.com> wrote:

> Enrico,
>
> Thank you for your feedback.
>
> Just FYI - this BP is the first part of the work that we've been working on
> improving metadata management on BookKeeper. We are doing this in three
> parts:
>
> - thin client : avoid talking to metadata store directly in clients, moving
> the metadata management to the bookie side.
> - new metadata store: storing metadata in bookies (both journal and
> snapshots are stored at zookeeper-based ledgers), reduce the zookeeper
> usage
> - eliminating zookeeper:  eliminate zookeeper usage completely.
>

That sounds really great to me.

 I feel it is a good roadmap I will help as much as possible in this
direction.



> One comment inline. I would let Jia answer other questions.
>
> On Tue, Sep 5, 2017 at 6:31 AM, Enrico Olivelli <eolive...@gmail.com>
> wrote:
>
> > Great to see you working on this !
> > I would be great to have such feature, as it is the first step to a
> > 'standalone' BookKeeper mode
> >
> > Some complementary ideas/first look questions:
> > - the document does not talk about security, IMHO we have at least to
> cover
> > authentication and TLS, it would be great to leverage existing
> AuthPlugins,
> > as they are based on exchanging byte[] (as SASL wants)
> > - do we have some kind of "bootstrap servers list" configuration option ?
> > the list should be complete or just a subset of bookies ? at connection
> the
> > client could discover the list of other bookies
> > - will the client connect to only one bookie at a time ? how we will deal
> > with errors ?
> > - should the bookie write on ZK metadata its gRPC endpoint info ? (this
> > will be useful for a bookie to tell about other bookies to the connected
> > clients)
> > - the bookie will be somehow a proxy for zookeeper, I think that the
> > 'watch' part is the more complex, we will have to deal with
> reconnections,
> > errors....maybe it is worth to write more detail about this
> >
> > Minor issues:
> > - Maybe you can consider using ledgerId and not ledger_id, like in
> > LedgerMetadataFormat we are using lastEntryId
> >
>
>
>
>
> > -In the "motivation" part you write that the fact the having more clients
> > than the number of bookies would be a problem for zookeeper, actually
> > zookeeper is very good at dealing with a huge number of clients.
> Actually I
> > am always running clusters with 3-5 bookies and 10-100 writing clients
> and
> > this has never given troubles'
> >
>
>
> First, I would not claim zookeeper is good at dealing with a huge number of
> clients when a zookeeper ensemble is only serving only 10-100 clients.
>
> Second, based on my production experiences, watch and session expires are
> the two main issues on zookeeper when there are a lot of watchers and a lot
> of connections (a lot means more than thousands or even tens of thousands).
>
> Watch and session expires are also two main reasons that I don't like
> zookeeper:
>
> - session expires. for simplicity, zookeeper tights session state directly
> with connection state. so when a connection is broken, a session is usually
> expired (unless it reconnects before session expires); when a session is
> expired, the underlying connection
> can not be used anymore, the application has to close the connection and
> recreate a new client (establishing a new connection). It is understandable
> that it makes zookeeper development super easy. However it is a very bad
> design in practice. Because
> it means if you can not establish a session, you can't use this connection
> and you have to create new connections: once your zookeeper is in a bad
> state (e.g. network issue or jvm gc), the whole environment will be a very
> bad state (e.g. connection storm), and can barely
> recover from the state until you kill clients and ask them to not connect
> to zookeeper.
>
> -  watcher: 1) it is one time watcher, I can't reliably use it to get
> updates 2) in order to set a watcher, you have to read a znode or get
> children. Image such a use case, clients are watching a list of znodes
> (e.g. list of bookies), when those clients expire, they have
> to rewatch the list. in order to rewatch the list, the clients have to read
> the list first even the list is never changed. It becomes a disaster,
> because all the clients will reread the whole list and overwhelm the
> network bandwidth, and cause session expires.
>

There is an interesting work from Jordan Z, the creator of Curator for
having persistent watches, I think this work could be useful for us
https://github.com/apache/zookeeper/pull/136

>
>
> I can tell a lot of production issues related to the above two behaviors
> (either one of them, or a combination of them) if you are interested.
>

Sure I believe you

Cheers
Enrico

>
>
>
> >
> > Future:
> > - as bookies will be proxies maybe we should take care not to overwhelm a
> > bookie with too many clients
> > - iteration on ledgers, sometimes the clients enumerates ledgers but it
> is
> > not interested in having all of them, as we are using the bookie as proxy
> > maybe some kind of "filter" (at least on custom metadata) would be create
> > to limit the number of returned items. Other point I don't know gRPC but
> it
> > does not seems to be very clear how to 'stop' the iteration
> >
> > -- Enrico
> >
> >
> > 2017-09-05 15:10 GMT+02:00 Jia Zhai <zhaiji...@gmail.com>:
> >
> > > Hi all,
> > >
> > > I have just posted a proposal to remove zookeeper dependency from
> > > bookkeeper client, to make bookkeeper client a thin client:
> > >
> > > https://cwiki.apache.org/confluence/display/BOOKKEEPER/
> > > BP-16%3A+remove+zookeeper+dependency+from+bookkeeper+client
> > >
> > >
> > > BookKeeper uses zookeeper for service discovery (discovering the
> > available
> > > bookies in the cluster), metadata management (storing all the metadata
> > for
> > > ledgers). However it exposes the metadata storage directly to the
> > clients,
> > > making bookkeeper client a very thick client. It also exposes some
> > > problems.
> > >
> > > This BP explores the possibility of eliminating zookeeper completely
> from
> > > client side, to produce a thin bookkeeper client.
> > >
> > > I will send a patch as soon as we agree on the proposal.
> > >
> > >
> > > Thanks.
> > >
> > > -Jia
> > >
> >
>
-- 


-- Enrico Olivelli

Reply via email to