This blog: https://bitworks.software/blog/en/2017-07-12-replicated-scalable-commitlog-with-apachebookkeeper.html, which also refer a little the limitation of zookeeper in bookkeeper
On Thu, Sep 7, 2017 at 9:45 AM, Jia Zhai <zhaiji...@gmail.com> wrote: > 👍. Thanks a lot for the suggestions and feed back. > > On Thu, Sep 7, 2017 at 4:24 AM, Sijie Guo <guosi...@gmail.com> wrote: > >> On Wed, Sep 6, 2017 at 1:07 PM, Enrico Olivelli <eolive...@gmail.com> >> wrote: >> >> > Off topic curiosity... Jia and Sijie, do you think we are going to drop >> ZK >> > from DL too? >> > >> >> Yes. That's the goal - 1) for large deployment, we are trying to overcome >> the limitation of zookeeper; 2) for smaller deployments, it will make >> deployment much easier, you just need to deploy a cluster of bookies. once >> it is done, you can use ledger api or log stream api to access the >> bookkeeper cluster. >> >> Both DL and BK are metadata storage pluggable. They have very clear >> interfaces on defining metadata operations. So it is straightforward to >> use >> a different metadata storage. >> >> >> > Enrico >> > >> > On mer 6 set 2017, 19:51 Enrico Olivelli <eolive...@gmail.com> wrote: >> > >> > > >> > > >> > > On mer 6 set 2017, 18:25 Sijie Guo <guosi...@gmail.com> wrote: >> > > >> > >> On Sep 6, 2017 4:57 AM, "Enrico Olivelli" <eolive...@gmail.com> >> wrote: >> > >> >> > >> Thank you Sijie and Jia for your comments and explanations, >> > >> answers inline >> > >> >> > >> 2017-09-06 2:23 GMT+02:00 Jia Zhai <zhaiji...@gmail.com>: >> > >> >> > >> > Thanks a lot Enrico and Sijie for your comments and information on >> > this. >> > >> > >> > >> > On Tue, Sep 5, 2017 at 9:31 PM, Enrico Olivelli < >> eolive...@gmail.com> >> > >> > wrote: >> > >> > >> > >> > > Great to see you working on this ! >> > >> > > I would be great to have such feature, as it is the first step >> to a >> > >> > > 'standalone' BookKeeper mode >> > >> > > >> > >> > > Some complementary ideas/first look questions: >> > >> > > - the document does not talk about security, IMHO we have at >> least >> > to >> > >> > cover >> > >> > > authentication and TLS, it would be great to leverage existing >> > >> > AuthPlugins, >> > >> > > as they are based on exchanging byte[] (as SASL wants) >> > >> > > >> > >> > [Jia] It is a good idea. We left the security part for now for a >> few >> > >> > reasons. 1) Make this BP more focus on removing zookeeper >> dependencies >> > >> from >> > >> > client. 2) It is introduced as a separated implementation of >> existing >> > >> > interfaces. So it won’t impact existing security story. And for >> > sure, >> > >> We >> > >> > will add the security part later after this. >> > >> > >> > >> >> > >> >> > >> I am fine, I am only afraid that we won't be able to support it in >> the >> > >> (near) future, >> > >> maybe you could just only cite the security story and add some >> reference >> > >> to >> > >> how we would deal with it in future >> > >> >> > >> >> > >> The new ledger manager will be first marked as experimental, until >> it is >> > >> stable and have security feature. >> > >> >> > >> How does that sound? >> > >> >> > > >> > > Ok >> > > >> > >> >> > >> >> > >> >> > >> > >> > >> > - do we have some kind of "bootstrap servers list" configuration >> > option >> > >> ? >> > >> > > the list should be complete or just a subset of bookies ? at >> > >> connection >> > >> > the >> > >> > > client could discover the list of other bookies >> > >> > > >> > >> > [Jia] Yes, we will have a `clientBootstrapBookies` settings in the >> > >> server >> > >> > set. It can be a list of bookies or just simple a DNS over the >> > bookies. >> > >> > Will add this to the BP >> > >> > >> > >> > - will the client connect to only one bookie at a time ? how we >> will >> > >> deal >> > >> > > with errors ? >> > >> > > >> > >> > [Jia] It will connect the the list of bootstrap servers. gPRC will >> > load >> > >> > balance the requests and manage the connection errors. >> > >> > >> > >> > - should the bookie write on ZK metadata its gRPC endpoint info ? >> > (this >> > >> > > will be useful for a bookie to tell about other bookies to the >> > >> connected >> > >> > > clients) >> > >> > > >> > >> > [Jia]No, it won’t. We don’t see a strong reason to add it. >> Especially >> > >> > eventually we may eliminate zookeeper completely. >> > >> > It can be a fixed port `3281`, or in a scheduler-based >> environment, it >> > >> is >> > >> > very easy to have a load balancer sitting in front of those >> bookies. >> > >> > >> > >> >> > >> I think a fixed port is not a good way. >> > >> You will not be able to run more than one bookie on a single host. >> > >> >> > >> We should support: >> > >> - configurable port >> > >> - ephemeral port for tests >> > >> >> > >> >> > >> I think what Jia means is a configurable port, but it is a relatively >> > >> fixed >> > >> port, which client doesn't discover this port from zookeeper. >> > >> >> > > >> > > Very good >> > > >> > >> >> > >> >> > >> Ideally I would like to have the local transport option, in order to >> > have >> > >> a >> > >> single JVM, but this is not a blocker problem, as we are running >> gRPC on >> > >> netty it should be feasible or we can create some kind of >> short-circut >> > >> between the client and the Bookie >> > >> >> > >> >> > >> GRPC supports inprocess channel. So you don't need to use the low >> level >> > >> netty settings. >> > >> >> > > >> > > Great >> > > >> > > So it sounds all good to me thanks >> > > >> > > Enrico >> > > >> > > >> > >> >> > >> I am OK for not writing this to the bookie metadata, leaving up to >> the >> > >> client have a configured list of bookies enabled to metadata >> operations >> > >> >> > >> >> > >> >> > >> >> > >> > >> > >> > - the bookie will be somehow a proxy for zookeeper, I think that >> the >> > >> > > 'watch' part is the more complex, we will have to deal with >> > >> > reconnections, >> > >> > > errors....maybe it is worth to write more detail about this >> > >> > > >> > >> > [Jia] The `watch` API is using the `streaming` rpc in gRPC. It is a >> > >> > straightforward proxy behavior, if a connection is broken, the >> client >> > >> will >> > >> > simply retry on watching again. >> > >> > >> > >> > >> > >> > > Minor issues: >> > >> > > - Maybe you can consider using ledgerId and not ledger_id, like >> in >> > >> > > LedgerMetadataFormat we are using lastEntryId >> > >> > > >> > >> > [Jia] Thanks, It is a protobuf style. The protobuf will convert >> > >> `ledger_id` >> > >> > to `ledgerId`. We don’t need to worry about this. >> > >> > >> > >> >> > >> got it, thanks >> > >> >> > >> >> > >> > >> > >> > >> > >> > > -In the "motivation" part you write that the fact the having more >> > >> clients >> > >> > > than the number of bookies would be a problem for zookeeper, >> > actually >> > >> > > zookeeper is very good at dealing with a huge number of clients. >> > >> > Actually I >> > >> > > am always running clusters with 3-5 bookies and 10-100 writing >> > clients >> > >> > and >> > >> > > this has never given troubles >> > >> > >> > >> > [Jia] :) Seems “10-100 writing clients” is not “a huge number of >> > >> clients”. >> > >> > >> > >> >> > >> OK, I agree with you an Sijie, I have no experience of larger >> clusters >> > >> >> > >> >> > >> > >> > >> > > >> > >> > >> > >> > >> > >> > >> > >> > > Future: >> > >> > > - as bookies will be proxies maybe we should take care not to >> > >> overwhelm >> > >> a >> > >> > > bookie with too many clients >> > >> > > >> > >> > [Jia] First, gRPC is based on Netty, the protocol is http2, so the >> > >> > connection is multiplexed. We don’t need to worry about connection >> > >> count. >> > >> > Second, all the bookies are treated equally for the metadata >> > operations, >> > >> > gRPC will load balancing the requests across the bookies. We don’t >> > need >> > >> to >> > >> > worry about some bookies are overwhelmed. >> > >> > >> > >> >> > >> gRPC sounds great >> > >> >> > >> >> > >> > >> > >> > >> > >> > > - iteration on ledgers, sometimes the clients enumerates ledgers >> but >> > >> it >> > >> > is >> > >> > > not interested in having all of them, as we are using the bookie >> as >> > >> proxy >> > >> > > maybe some kind of "filter" (at least on custom metadata) would >> be >> > >> create >> > >> > > to limit the number of returned items. Other point I don't know >> gRPC >> > >> but >> > >> > it >> > >> > > does not seems to be very clear how to 'stop' the iteration >> > >> > > >> > >> > [Jia] Thanks, We can add it later. For now, we would like to focus >> on >> > >> > adding the features the ledger manager needs. >> > >> > >> > >> >> > >> Yup >> > >> >> > >> -- Enrico >> > >> >> > >> >> > >> > >> > >> > > >> > >> > > -- Enrico >> > >> > > >> > >> > > >> > >> > > 2017-09-05 15:10 GMT+02:00 Jia Zhai <zhaiji...@gmail.com>: >> > >> > > >> > >> > > > Hi all, >> > >> > > > >> > >> > > > I have just posted a proposal to remove zookeeper dependency >> from >> > >> > > > bookkeeper client, to make bookkeeper client a thin client: >> > >> > > > >> > >> > > > https://cwiki.apache.org/confluence/display/BOOKKEEPER/ >> > >> > > > BP-16%3A+remove+zookeeper+dependency+from+bookkeeper+client >> > >> > > > >> > >> > > > >> > >> > > > BookKeeper uses zookeeper for service discovery (discovering >> the >> > >> > > available >> > >> > > > bookies in the cluster), metadata management (storing all the >> > >> metadata >> > >> > > for >> > >> > > > ledgers). However it exposes the metadata storage directly to >> the >> > >> > > clients, >> > >> > > > making bookkeeper client a very thick client. It also exposes >> some >> > >> > > > problems. >> > >> > > > >> > >> > > > This BP explores the possibility of eliminating zookeeper >> > completely >> > >> > from >> > >> > > > client side, to produce a thin bookkeeper client. >> > >> > > > >> > >> > > > I will send a patch as soon as we agree on the proposal. >> > >> > > > >> > >> > > > >> > >> > > > Thanks. >> > >> > > > >> > >> > > > -Jia >> > >> > > > >> > >> > > >> > >> > >> > >> >> > > -- >> > > >> > > >> > > -- Enrico Olivelli >> > > >> > -- >> > >> > >> > -- Enrico Olivelli >> > >> > >