Re: intern project idea: decouple zab from zookeeper

Michi Mutsuzaki Sun, 01 Jun 2014 18:11:50 -0700

Thank you for the clarifications Flavio. I guess 'heavyweight' is a
relative term. A typical use cases I deal with is to replicate small
amount of data (<1GB) among 3 ~ 5 servers, and having access to zab
would be very useful.


I didn't mean to suggest to separate zab in the zookeeper code base. I
referred to ZOOKEEPER-30 to highlight the usefulness of having a
common interface for replication protocol.

Thanks!
--Michi


On Sun, Jun 1, 2014 at 2:52 PM, Flavio Junqueira <fpjunque...@yahoo.com> wrote:
> I'm not sure it is worth transforming this discussion into a bk vs. zk/zab. I 
> think the space they target is different, although they both deal with 
> replication. It does sound worth having a separate zab implementation, but it 
> isn't clear that it is worth separating zab in the zookeeper code base.
>
> There seem to be some misconceptions here, so here are some clarifications:
>
> - Zab itself doesn't deal with snapshots, it essentially replicates a log. 
> The use of snapshots is an optimization to speed up recovery, and sure, it 
> fits well into the framework of the protocol.
> - BookKeeper indeed relies on zk because it requires a component for 
> configuration and metadata of ledgers. By relying on a separate configuration 
> component, the pool of bookies can grow and shrink arbitrarily, and such 
> changes do not affect write performance like with zk. The configuration 
> component, however, needs the properties of a protocol like zab, so we still 
> need something like zab.
> - Calling BK heavyweight is a bit of a stretch. Bookies + zk makes only two 
> components! These are not production numbers, but I don't see a deployment 
> with fewer than 10 machines (5 for ZK + 5 bookies) being very interesting. If 
> that's a significant fraction of your overall server footprint, then sure, it 
> is heavy for you.
>
> -Flavio
>
> On 01 Jun 2014, at 19:22, Michi Mutsuzaki <mi...@cs.stanford.edu> wrote:
>
>> Hi Ivan,
>>
>> The use case this project is going after is to durably replicate
>> in-memory state. I think this project can differentiate itself from
>> BookKeeper.
>>
>> 1. BookKeeper is pretty heavyweight, as you need to deploy ZooKeeper
>> and bookies. I think there are use cases where you don't need the
>> horizontal scalability BookKeeper provides, and you prefer to have a
>> light-weight library for replicating state. ZooKeeper is one such
>> example :)
>> 2. Please correct me if I'm wrong, but BookKeeper is not designed for
>> maintaining multiple in-memory replicas. A ledger can't be opened for
>> reading if it's already open for writing, and you need to recover by
>> restoring from a snapshot and replaying log entries if the writer goes
>> down.
>> 3. ZOOKEEPER-30, which I wasn't initially aware of, is another
>> motivation. I think there is a value in having a common interface for
>> consensus algorithms so that services can plug in different
>> implementations. This makes it easier to benchmark and test
>> correctness of various implementations.
>>
>>
>> On Sun, Jun 1, 2014 at 3:05 AM, Ivan Kelly <iv...@apache.org> wrote:
>>> On Sat, May 31, 2014 at 02:29:34PM -0700, Michi Mutsuzaki wrote:
>>>> I'm hosting an intern this summer. One project I've been thinking
>>>> about is to decouple zab from zookeeper. There are many use cases
>>>> where you need a quorum based replication, but the hierarchical data
>>>> model doesn't work well. A smallish (~1GB?) replicated key-value store
>>>> with millions of entires is one such example. The goal of the project
>>>> is to decouple the consensus algorithm (zab) from the data model
>>>> (zookeeper) more cleanly so that the users can define their own data
>>>> models and use zab to replicate the data.
>>> So you want a replicated log which give you the guarantees of zab. How
>>> would this be very different from Bookkeeper?
>>>
>>> -Ivan
>

Re: intern project idea: decouple zab from zookeeper

Reply via email to