Re: [DISCUSS] BP-16: remove zookeeper dependency from bookkeeper client

Sijie Guo Tue, 05 Sep 2017 12:28:48 -0700

Enrico,

Thank you for your feedback.


Just FYI - this BP is the first part of the work that we've been working on
improving metadata management on BookKeeper. We are doing this in three
parts:

- thin client : avoid talking to metadata store directly in clients, moving
the metadata management to the bookie side.
- new metadata store: storing metadata in bookies (both journal and
snapshots are stored at zookeeper-based ledgers), reduce the zookeeper usage
- eliminating zookeeper:  eliminate zookeeper usage completely.

One comment inline. I would let Jia answer other questions.

On Tue, Sep 5, 2017 at 6:31 AM, Enrico Olivelli <eolive...@gmail.com> wrote:

> Great to see you working on this !
> I would be great to have such feature, as it is the first step to a
> 'standalone' BookKeeper mode
>
> Some complementary ideas/first look questions:
> - the document does not talk about security, IMHO we have at least to cover
> authentication and TLS, it would be great to leverage existing AuthPlugins,
> as they are based on exchanging byte[] (as SASL wants)
> - do we have some kind of "bootstrap servers list" configuration option ?
> the list should be complete or just a subset of bookies ? at connection the
> client could discover the list of other bookies
> - will the client connect to only one bookie at a time ? how we will deal
> with errors ?
> - should the bookie write on ZK metadata its gRPC endpoint info ? (this
> will be useful for a bookie to tell about other bookies to the connected
> clients)
> - the bookie will be somehow a proxy for zookeeper, I think that the
> 'watch' part is the more complex, we will have to deal with reconnections,
> errors....maybe it is worth to write more detail about this
>
> Minor issues:
> - Maybe you can consider using ledgerId and not ledger_id, like in
> LedgerMetadataFormat we are using lastEntryId
>




> -In the "motivation" part you write that the fact the having more clients
> than the number of bookies would be a problem for zookeeper, actually
> zookeeper is very good at dealing with a huge number of clients. Actually I
> am always running clusters with 3-5 bookies and 10-100 writing clients and
> this has never given troubles'
>


First, I would not claim zookeeper is good at dealing with a huge number of
clients when a zookeeper ensemble is only serving only 10-100 clients.

Second, based on my production experiences, watch and session expires are
the two main issues on zookeeper when there are a lot of watchers and a lot
of connections (a lot means more than thousands or even tens of thousands).

Watch and session expires are also two main reasons that I don't like
zookeeper:

- session expires. for simplicity, zookeeper tights session state directly
with connection state. so when a connection is broken, a session is usually
expired (unless it reconnects before session expires); when a session is
expired, the underlying connection
can not be used anymore, the application has to close the connection and
recreate a new client (establishing a new connection). It is understandable
that it makes zookeeper development super easy. However it is a very bad
design in practice. Because
it means if you can not establish a session, you can't use this connection
and you have to create new connections: once your zookeeper is in a bad
state (e.g. network issue or jvm gc), the whole environment will be a very
bad state (e.g. connection storm), and can barely
recover from the state until you kill clients and ask them to not connect
to zookeeper.

-  watcher: 1) it is one time watcher, I can't reliably use it to get
updates 2) in order to set a watcher, you have to read a znode or get
children. Image such a use case, clients are watching a list of znodes
(e.g. list of bookies), when those clients expire, they have
to rewatch the list. in order to rewatch the list, the clients have to read
the list first even the list is never changed. It becomes a disaster,
because all the clients will reread the whole list and overwhelm the
network bandwidth, and cause session expires.


I can tell a lot of production issues related to the above two behaviors
(either one of them, or a combination of them) if you are interested.



>
> Future:
> - as bookies will be proxies maybe we should take care not to overwhelm a
> bookie with too many clients
> - iteration on ledgers, sometimes the clients enumerates ledgers but it is
> not interested in having all of them, as we are using the bookie as proxy
> maybe some kind of "filter" (at least on custom metadata) would be create
> to limit the number of returned items. Other point I don't know gRPC but it
> does not seems to be very clear how to 'stop' the iteration
>
> -- Enrico
>
>
> 2017-09-05 15:10 GMT+02:00 Jia Zhai <zhaiji...@gmail.com>:
>
> > Hi all,
> >
> > I have just posted a proposal to remove zookeeper dependency from
> > bookkeeper client, to make bookkeeper client a thin client:
> >
> > https://cwiki.apache.org/confluence/display/BOOKKEEPER/
> > BP-16%3A+remove+zookeeper+dependency+from+bookkeeper+client
> >
> >
> > BookKeeper uses zookeeper for service discovery (discovering the
> available
> > bookies in the cluster), metadata management (storing all the metadata
> for
> > ledgers). However it exposes the metadata storage directly to the
> clients,
> > making bookkeeper client a very thick client. It also exposes some
> > problems.
> >
> > This BP explores the possibility of eliminating zookeeper completely from
> > client side, to produce a thin bookkeeper client.
> >
> > I will send a patch as soon as we agree on the proposal.
> >
> >
> > Thanks.
> >
> > -Jia
> >
>

Re: [DISCUSS] BP-16: remove zookeeper dependency from bookkeeper client

Reply via email to