Re: intern project idea: decouple zab from zookeeper

Flavio Junqueira Sun, 01 Jun 2014 14:54:00 -0700

I'm not sure it is worth transforming this discussion into a bk vs. zk/zab. I 
think the space they target is different, although they both deal with 
replication. It does sound worth having a separate zab implementation, but it 
isn't clear that it is worth separating zab in the zookeeper code base.


There seem to be some misconceptions here, so here are some clarifications: 

- Zab itself doesn't deal with snapshots, it essentially replicates a log. The 
use of snapshots is an optimization to speed up recovery, and sure, it fits 
well into the framework of the protocol.
- BookKeeper indeed relies on zk because it requires a component for 
configuration and metadata of ledgers. By relying on a separate configuration 
component, the pool of bookies can grow and shrink arbitrarily, and such 
changes do not affect write performance like with zk. The configuration 
component, however, needs the properties of a protocol like zab, so we still 
need something like zab.
- Calling BK heavyweight is a bit of a stretch. Bookies + zk makes only two 
components! These are not production numbers, but I don't see a deployment with 
fewer than 10 machines (5 for ZK + 5 bookies) being very interesting. If that's 
a significant fraction of your overall server footprint, then sure, it is heavy 
for you.

-Flavio

On 01 Jun 2014, at 19:22, Michi Mutsuzaki <[email protected]> wrote:

> Hi Ivan,
> 
> The use case this project is going after is to durably replicate
> in-memory state. I think this project can differentiate itself from
> BookKeeper.
> 
> 1. BookKeeper is pretty heavyweight, as you need to deploy ZooKeeper
> and bookies. I think there are use cases where you don't need the
> horizontal scalability BookKeeper provides, and you prefer to have a
> light-weight library for replicating state. ZooKeeper is one such
> example :)
> 2. Please correct me if I'm wrong, but BookKeeper is not designed for
> maintaining multiple in-memory replicas. A ledger can't be opened for
> reading if it's already open for writing, and you need to recover by
> restoring from a snapshot and replaying log entries if the writer goes
> down.
> 3. ZOOKEEPER-30, which I wasn't initially aware of, is another
> motivation. I think there is a value in having a common interface for
> consensus algorithms so that services can plug in different
> implementations. This makes it easier to benchmark and test
> correctness of various implementations.
> 
> 
> On Sun, Jun 1, 2014 at 3:05 AM, Ivan Kelly <[email protected]> wrote:
>> On Sat, May 31, 2014 at 02:29:34PM -0700, Michi Mutsuzaki wrote:
>>> I'm hosting an intern this summer. One project I've been thinking
>>> about is to decouple zab from zookeeper. There are many use cases
>>> where you need a quorum based replication, but the hierarchical data
>>> model doesn't work well. A smallish (~1GB?) replicated key-value store
>>> with millions of entires is one such example. The goal of the project
>>> is to decouple the consensus algorithm (zab) from the data model
>>> (zookeeper) more cleanly so that the users can define their own data
>>> models and use zab to replicate the data.
>> So you want a replicated log which give you the guarantees of zab. How
>> would this be very different from Bookkeeper?
>> 
>> -Ivan

Re: intern project idea: decouple zab from zookeeper

Reply via email to