[Proposal] Support "re-open, append" semantic in BookKeeper.

Sijie Guo Sun, 13 Jan 2013 19:53:58 -0800

Hello all,

Currently Hedwig used *ledgers* to store messages for a topic. It requires
lots of metadata operations when a hub server owned a topic. These metadata
operations are:


   1. read topic persistence info for a topic. (1 metadata read operation)
   2. close the last opened ledger. (1 metadata read operation, 2 metadata
   write operations)
   3. create a new ledger to write. (1 metadata write operation)
   4. update topic persistence info fot the topic to track the new ledger.
   (1 metadata write operation)

so there are at least 2 metadata read operations and 4 metadata write
operations when acquiring a topic. if a hub server owned lots of topics
restarts, it would introduce a spike of metadata accesses to the metadata
storage (e.g. ZooKeeper).

Currently hedwig's design is originated from ledger's *"write once, read
many"* semantic.

   1. Ledger id is generated by bookkeeper. Hedwig needs to record ledger
   id in extra places, which introduce extra metadata accesses.
   2. A ledger could not wrote any more entries after it was closed => so
   hedwig has to create a new ledger to write new entries after the ownership
   of a topic is changed (e.g. hub server failure, topic release).
   3. A ledger's entries could not be *deleted* only after a ledger is
   deleted => so hedwig has to change ledgers, which let entries could be
   consumed by *deleting* ledger after all subscribers consumed.

I proposed two new apis accompanied with "re-open, append" semantic in
BookKeeper, for high performance metadata access and easy metadata
management for applications.

public void openLedger(String ledgerName, DigestType digestType,
byte[] passwd, Mode mode);

*Mode* indicates the access mode of a ledger, which would be *O_CREATE*, *
O_APPEND*, *O_RDONLY*.

   - O_CREATE: create a new ledger with the given ledger name. if there is
   a ledger existed already, fail the creation. similar as createLedger now.
   - O_APPEND: open a new ledger with the given ledger name and continue
   write entries.
   - O_RDONLY: open a new ledger w/o changing any state just reading
   entries already persisted. similar as openLedgerNoRecovery now.

*ledgerName* indicates the name of a ledger. user could pick up either name
he likes, so he could manage his ledgers in his way like introducing
namespace over it, instead of bookkeeper generatating ledger id for them.
(in most of cases, application needs to find another place to store the
generated ledger id. the practise is really bad)

public void shrink(long endEntryId, boolean force) throws BKException;

*Shrink* means cutting the entries starting from *startEntryId* to *
endEntryId* (endEntryId is non-inclusive). *startEntryId*is implicit in
ledger metadata, which is 0 for a non-shrinked ledger, while it is *
endEntryId* from previous valid shrink.

'Force' flag indicate whether to issue garbage collection request after we
just move the *startEntryId* to *endEntryId*. If the flag is true, we issue
garbage collection request to notify bookie server to do garbage
collection; otherwise, we just move *startEntryId* to *endEntryId*. This
feature might be useful for some applications. Take Hedwig for example, we
could leverage this feature not to store the subscriber state for those
topics which have only one subscriber for each. Each time after specific
number of messages consumed, we move the entry point by*shrink(entryId,
false)*. After several messages consumed, we garbage collected them by
*shrink(entryId,
true)*.

Using *shrink*, application could relaim the disk space occupied by a
ledger w/o creating new ledger and deleting old one.

These two operations are based on two mechanisms: one is 'session fencing',
and the other one is 'improved garbage collection (BOOKKEEPER-464)'.
Details are in the gist https://gist.github.com/4520260 . I would try to
start working on some drafts based on the idea to demonstrate its
correctness.

Welcome for comments and discussions.
-Sijie

[Proposal] Support "re-open, append" semantic in BookKeeper.

Reply via email to