Re: [Proposal] Support "re-open, append" semantic in BookKeeper.

Flavio Junqueira Mon, 14 Jan 2013 01:07:16 -0800

Thanks for your proposal, Sijie. The cost of owning a topic seems indeed high, 
and your motivation to raise that point seems to be that upon hub crashes there 
will be a spike of accesses to the metadata store. According to your analysis, 
the size of the spike is linear on the number of topics the faulty hub owns. 
Your proposal just changes the constant, but it is still linear, yes?


Although I think I understand the motivation for your proposal, I'm not really 
in favor of extending the BK API like this. It adds a number of new concurrent 
scenarios, increasing the complexity of what we expose, and it violates one 
principle that we used when first designing this system, which is keeping 
functionality to a bare minimum so that we can implement it efficiently. Other 
functionality needed can be implemented on top. In fact, isn't it the case that 
you can do at least some of what you're proposing with managed ledgers?

One point you raised that concerns me a bit is the cost of a close operation. I 
need to review it because I'm not sure why we currently need 3 accesses to the 
metadata store. It should really be just one in the regular case (no concurrent 
attempts to close a ledger through ledger recovery). I agree though that we 
need to think about how to deal efficiently with the hedwig issue you're 
raising.

-Flavio

On Jan 14, 2013, at 4:53 AM, Sijie Guo <[email protected]> wrote:

> Hello all,
> 
> Currently Hedwig used *ledgers* to store messages for a topic. It requires
> lots of metadata operations when a hub server owned a topic. These metadata
> operations are:
> 
>   1. read topic persistence info for a topic. (1 metadata read operation)
>   2. close the last opened ledger. (1 metadata read operation, 2 metadata
>   write operations)
>   3. create a new ledger to write. (1 metadata write operation)
>   4. update topic persistence info fot the topic to track the new ledger.
>   (1 metadata write operation)
> 
> so there are at least 2 metadata read operations and 4 metadata write
> operations when acquiring a topic. if a hub server owned lots of topics
> restarts, it would introduce a spike of metadata accesses to the metadata
> storage (e.g. ZooKeeper).
> 
> Currently hedwig's design is originated from ledger's *"write once, read
> many"* semantic.
> 
>   1. Ledger id is generated by bookkeeper. Hedwig needs to record ledger
>   id in extra places, which introduce extra metadata accesses.
>   2. A ledger could not wrote any more entries after it was closed => so
>   hedwig has to create a new ledger to write new entries after the ownership
>   of a topic is changed (e.g. hub server failure, topic release).
>   3. A ledger's entries could not be *deleted* only after a ledger is
>   deleted => so hedwig has to change ledgers, which let entries could be
>   consumed by *deleting* ledger after all subscribers consumed.
> 
> I proposed two new apis accompanied with "re-open, append" semantic in
> BookKeeper, for high performance metadata access and easy metadata
> management for applications.
> 
> public void openLedger(String ledgerName, DigestType digestType,
> byte[] passwd, Mode mode);
> 
> *Mode* indicates the access mode of a ledger, which would be *O_CREATE*, *
> O_APPEND*, *O_RDONLY*.
> 
>   - O_CREATE: create a new ledger with the given ledger name. if there is
>   a ledger existed already, fail the creation. similar as createLedger now.
>   - O_APPEND: open a new ledger with the given ledger name and continue
>   write entries.
>   - O_RDONLY: open a new ledger w/o changing any state just reading
>   entries already persisted. similar as openLedgerNoRecovery now.
> 
> *ledgerName* indicates the name of a ledger. user could pick up either name
> he likes, so he could manage his ledgers in his way like introducing
> namespace over it, instead of bookkeeper generatating ledger id for them.
> (in most of cases, application needs to find another place to store the
> generated ledger id. the practise is really bad)
> 
> public void shrink(long endEntryId, boolean force) throws BKException;
> 
> *Shrink* means cutting the entries starting from *startEntryId* to *
> endEntryId* (endEntryId is non-inclusive). *startEntryId*is implicit in
> ledger metadata, which is 0 for a non-shrinked ledger, while it is *
> endEntryId* from previous valid shrink.
> 
> 'Force' flag indicate whether to issue garbage collection request after we
> just move the *startEntryId* to *endEntryId*. If the flag is true, we issue
> garbage collection request to notify bookie server to do garbage
> collection; otherwise, we just move *startEntryId* to *endEntryId*. This
> feature might be useful for some applications. Take Hedwig for example, we
> could leverage this feature not to store the subscriber state for those
> topics which have only one subscriber for each. Each time after specific
> number of messages consumed, we move the entry point by*shrink(entryId,
> false)*. After several messages consumed, we garbage collected them by
> *shrink(entryId,
> true)*.
> 
> Using *shrink*, application could relaim the disk space occupied by a
> ledger w/o creating new ledger and deleting old one.
> 
> These two operations are based on two mechanisms: one is 'session fencing',
> and the other one is 'improved garbage collection (BOOKKEEPER-464)'.
> Details are in the gist https://gist.github.com/4520260 . I would try to
> start working on some drafts based on the idea to demonstrate its
> correctness.
> 
> Welcome for comments and discussions.
> -Sijie

Re: [Proposal] Support "re-open, append" semantic in BookKeeper.

Reply via email to