Re: [Proposal] Support "re-open, append" semantic in BookKeeper.

Sijie Guo Tue, 15 Jan 2013 21:34:17 -0800

> Originally, it was meant to have a number of
> long lived subscriptions, over which a lot of data travelled. Now the
> load has flipped to a large number of short lived subscriptions, over
> which relatively little data travels.

The topic discussed here doesn't relate to hedwig subscriptions, it just
about how hedwig use ledgers to store its messages.  Even there are no
subscriptions, the problem is still there. The restart of a hub server
carrying large number of topics would hit the metadata storage with many
accesses. The hit is a hub server acquiring a topic, no matter the
subscription is long lived or short lived. after topic is acquired,
following accesses are in memory, which doesn't cause any performance issue.

> We can embed some other or
write our own. Anything we would do would need to be persistent,
support CAS, and replicated.

First of all, I am interested in the idea having a built-in metadata
storage for both Hedwig and BookKeeper, although I think it is too complex
to implement a distributed, robust and scalable metadata storage from
scratch.

But we should separate the capacity problem from the software problem. A
high performance and scalable metadata storage would help for resolving
capacity problem. but either implementing a new one or leveraging a high
performance one doesn't change the fact that it still need so many metadata
accesses to acquire topic. A bad implementation causing such many metadata
accesses is a software problem. If we had chance to improve it, why not?

> The ledger can still be read many times, but you have removed the
guarantee that what is read each time will be the same thing.

How we guarantee a reader's behavior when a ledger is removed at the same
time? We don't guarantee it right now, right? It is similar thing for a
'shrink' operation which remove part of entries, while 'delete' operation
removes whole entries?

And if I remembered correctly, readers only see the same thing when a
ledger is closed. What I proposed doesn't volatile this contract.  If a
ledger is closed (state is in CLOSED), an application can't re-open it. If
a ledger isn't closed yet, an application can recover previous state and
continue writing entries using this ledger. for applications, they could
still use 'create-close-create' style to use ledgers, or evolve to new api
for efficiency smoothly, w/o breaking any backward compatibility.

And one more point, using a user defined name or using a generated ledger
id is not a big problem for bookkeeper system. As BOOKKEEPER-438 (
https://issues.apache.org/jira/browse/BOOKKEEPER-438) planned to move
ledger id generation out of LedgerManager, LedgerManager would just focus
on how to store ledger metadata by a ledger key (the key could be a user
defined string/path, or a generated long ledger id). From the perspective
of keeping functionality as a minmum system, a BookKeeper client with
ledger id generation could base on a BookKeeper client providing custom
ledger name, isn't it? :-)

-Sijie

On Tue, Jan 15, 2013 at 3:42 AM, Ivan Kelly <[email protected]> wrote:

>
> > Let me clarify the 3 accesses for a close operation.
> >
> > 1) first read the ledger metadata. (1 metadata read)
> > 2) if the ledger state is in open state, write ledger metadata to
> > IN_RECOVERY state. (1 metadata write)
> > 3) do ledger recovery procedure
> > 4) update last entry id and change state to CLOSED. (1 metadata write)
> >
> > I am not very sure about whether we could get ride of step 2). But at
> > least, we still have 1 read and 1 write for a close operation. (NOTE:
> close
> > here is not create a ledger and close it. it is the *close* when open to
> > recover)
> You cannot get rid of 2) without sacrificing correctness.
>
> > The big problem for current bookkeeper is applications (either Hedwig or
> > managed ledger) needs to find extra places to record the ledger ids
> > generated by bookkeeper. It is not efficient and also duplicates the
> > metadata storage, especially for a system facing large number of topics
> or
> > ledgers, it is Achilles' Heel.
> I think a better way to handle this would be just to scale the
> metadata storage along with the system. Part of the problem here is
> that hedwig is being used in a way which is quite different to what it
> was first designed for. Originally, it was meant to have a number of
> long lived subscriptions, over which a lot of data travelled. Now the
> load has flipped to a large number of short lived subscriptions, over
> which relatively little data travels.
>
> This means that the disk capacity of the bookies isn't fully used. So
> how about using that for metadata also? We can embed some other or
> write our own. Anything we would do would need to be persistent,
> support CAS, and replicated. It would be a fairly hefty project, but
> it would give us horizontal scalability and reduce the number of
> moving parts required to provide this scalability.
>
> For writing our own, we could have a metadata store that sits inside
> the bookie, sharing the journal and snapshotting every so often, so it
> should barely affect performance.
>
> > I took care of re-using existing concepts with minmum changes to extend
> the
> > api to provide more flexibility and efficiency for applications. I don't
> > think it violates the principle to make the system a bare minmum. For
> > example, I don't add cursor concept in ledger. The ledger is still could
> be
> > read many times. How to shrink a ledger depends on applications. Hedwig
> or
> > managed ledger would record their cursors (subscriber states) and shrink
> a
> > ledger when they decided to do that.
> The ledger can still be read many times, but you have removed the
> guarantee that what is read each time will be the same thing.
>
> -Ivan
>

Re: [Proposal] Support "re-open, append" semantic in BookKeeper.

Reply via email to