jiazhai commented on a change in pull request #1113: BP-28: Etcd as metadata
File path: site/bps/BP-28-etcd-as-metadata-store.md
@@ -0,0 +1,102 @@
+title: "BP-28: use etcd as metadata store"
+state: 'Under Discussion'
+Currently bookkeeper uses zookeeper as the metadata store. However there is a
couple of issues with current approach, especially using zookeeper.
+These issues includes:
+1. You need to allocate special nodes for zookeeper. These nodes need to be
treated specially, and have their own monitoring.
+ Ops need to understand both bookies and zookeeper.
+2. ZooKeeper is the scalability bottleneck. ZooKeeper doesn?t scale writes as
you add nodes. This means that if your bookkeeper
+ cluster reaches the maximum write throughput that ZK can sustain, you?ve
reached the maximum capacity of your cluster, and there?s nothing you
+ can do (except buy bigger hardware for your special nodes).
+3. ZooKeeper enforces you into its programming model. In general, its
programming model is not too bad. However it becomes problematic when
+ the scale goes up (e.g. the number of clients and watcher increase). The
issues usually comes from _session expires_ and _watcher_.
+ - *Session Expires*: For simplicity, ZooKeeper ties session state directly
with connection state. So when a connection is broken, a session is usually
expired (unless it reconnects before session expires), and when a session is
expired, the underlying connection can not be used anymore, the application has
to close the connection and re-establish a new client (a new connection). It is
understandable that it makes zookeeper development easy. However in reality, it
means if you can not establish a session, you can?t use this connection and you
have to create new connections. Once your zookeeper cluster is in a bad state
(e.g. network issue or jvm gc), the whole cluster is usually unable to recover
because of the connection storm introduced by session expires.
+ - *Watchers*: The zookeeper watcher is one time watcher, applications can?t
reliably use it to get updates. In order to set a watcher, you have to read a
znode or get children. Imagine such a use case, clients are watching a list of
znodes (e.g. list of bookies), when those clients expire, they have to get the
list of znodes in order to rewatch the list, even the list is never changed.
+ - The combination of session expires and watchers is often the root cause of
critical zookeeper outages.
+This proposal is to explore other existing systems such as etcd as the
metadata store. Using Etcd doesn't address concerns #1, however it might
+address concern #2 and #3 to some extend. And if you are running bookkeeper in
k8s, there is already an Etcd instance available. It can become easier to run
+bookkeeper on k8s if we can use Etcd as the metadata store.
nit: seems bring some line breaks because of copy paste?
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
Apache Git Services