On mar 5 set 2017, 21:28 Sijie Guo <guosi...@gmail.com> wrote: > Enrico, > > Thank you for your feedback. > > Just FYI - this BP is the first part of the work that we've been working on > improving metadata management on BookKeeper. We are doing this in three > parts: > > - thin client : avoid talking to metadata store directly in clients, moving > the metadata management to the bookie side. > - new metadata store: storing metadata in bookies (both journal and > snapshots are stored at zookeeper-based ledgers), reduce the zookeeper > usage > - eliminating zookeeper: eliminate zookeeper usage completely. >
That sounds really great to me. I feel it is a good roadmap I will help as much as possible in this direction. > One comment inline. I would let Jia answer other questions. > > On Tue, Sep 5, 2017 at 6:31 AM, Enrico Olivelli <eolive...@gmail.com> > wrote: > > > Great to see you working on this ! > > I would be great to have such feature, as it is the first step to a > > 'standalone' BookKeeper mode > > > > Some complementary ideas/first look questions: > > - the document does not talk about security, IMHO we have at least to > cover > > authentication and TLS, it would be great to leverage existing > AuthPlugins, > > as they are based on exchanging byte[] (as SASL wants) > > - do we have some kind of "bootstrap servers list" configuration option ? > > the list should be complete or just a subset of bookies ? at connection > the > > client could discover the list of other bookies > > - will the client connect to only one bookie at a time ? how we will deal > > with errors ? > > - should the bookie write on ZK metadata its gRPC endpoint info ? (this > > will be useful for a bookie to tell about other bookies to the connected > > clients) > > - the bookie will be somehow a proxy for zookeeper, I think that the > > 'watch' part is the more complex, we will have to deal with > reconnections, > > errors....maybe it is worth to write more detail about this > > > > Minor issues: > > - Maybe you can consider using ledgerId and not ledger_id, like in > > LedgerMetadataFormat we are using lastEntryId > > > > > > > > -In the "motivation" part you write that the fact the having more clients > > than the number of bookies would be a problem for zookeeper, actually > > zookeeper is very good at dealing with a huge number of clients. > Actually I > > am always running clusters with 3-5 bookies and 10-100 writing clients > and > > this has never given troubles' > > > > > First, I would not claim zookeeper is good at dealing with a huge number of > clients when a zookeeper ensemble is only serving only 10-100 clients. > > Second, based on my production experiences, watch and session expires are > the two main issues on zookeeper when there are a lot of watchers and a lot > of connections (a lot means more than thousands or even tens of thousands). > > Watch and session expires are also two main reasons that I don't like > zookeeper: > > - session expires. for simplicity, zookeeper tights session state directly > with connection state. so when a connection is broken, a session is usually > expired (unless it reconnects before session expires); when a session is > expired, the underlying connection > can not be used anymore, the application has to close the connection and > recreate a new client (establishing a new connection). It is understandable > that it makes zookeeper development super easy. However it is a very bad > design in practice. Because > it means if you can not establish a session, you can't use this connection > and you have to create new connections: once your zookeeper is in a bad > state (e.g. network issue or jvm gc), the whole environment will be a very > bad state (e.g. connection storm), and can barely > recover from the state until you kill clients and ask them to not connect > to zookeeper. > > - watcher: 1) it is one time watcher, I can't reliably use it to get > updates 2) in order to set a watcher, you have to read a znode or get > children. Image such a use case, clients are watching a list of znodes > (e.g. list of bookies), when those clients expire, they have > to rewatch the list. in order to rewatch the list, the clients have to read > the list first even the list is never changed. It becomes a disaster, > because all the clients will reread the whole list and overwhelm the > network bandwidth, and cause session expires. > There is an interesting work from Jordan Z, the creator of Curator for having persistent watches, I think this work could be useful for us https://github.com/apache/zookeeper/pull/136 > > > I can tell a lot of production issues related to the above two behaviors > (either one of them, or a combination of them) if you are interested. > Sure I believe you Cheers Enrico > > > > > > > Future: > > - as bookies will be proxies maybe we should take care not to overwhelm a > > bookie with too many clients > > - iteration on ledgers, sometimes the clients enumerates ledgers but it > is > > not interested in having all of them, as we are using the bookie as proxy > > maybe some kind of "filter" (at least on custom metadata) would be create > > to limit the number of returned items. Other point I don't know gRPC but > it > > does not seems to be very clear how to 'stop' the iteration > > > > -- Enrico > > > > > > 2017-09-05 15:10 GMT+02:00 Jia Zhai <zhaiji...@gmail.com>: > > > > > Hi all, > > > > > > I have just posted a proposal to remove zookeeper dependency from > > > bookkeeper client, to make bookkeeper client a thin client: > > > > > > https://cwiki.apache.org/confluence/display/BOOKKEEPER/ > > > BP-16%3A+remove+zookeeper+dependency+from+bookkeeper+client > > > > > > > > > BookKeeper uses zookeeper for service discovery (discovering the > > available > > > bookies in the cluster), metadata management (storing all the metadata > > for > > > ledgers). However it exposes the metadata storage directly to the > > clients, > > > making bookkeeper client a very thick client. It also exposes some > > > problems. > > > > > > This BP explores the possibility of eliminating zookeeper completely > from > > > client side, to produce a thin bookkeeper client. > > > > > > I will send a patch as soon as we agree on the proposal. > > > > > > > > > Thanks. > > > > > > -Jia > > > > > > -- -- Enrico Olivelli