> So.. log truncation, the way it's needed in leader based systems like RAFT
> and Kafka, where leader may have entries appended to its log which are not
> replicated. If leader crashes before replicating entries, which will elect
> other node as leader. Once the previous leader rejoins the cluster, it
> needs to truncate its own log removing all the conflicting entries. This
> case wont happen in bookkeeper?

Something similar does happen in bookkeeper. Firstly, it's important
to keep in mind that a single ledger in bookkeeper only has a single
writer ever. If the writer crashes, no new entries can be added to
that ledger. In this way, you can kinda think of a ledger as a term in
RAFT or an epoch in ZK. To build a replicated log in bookkeeper, you
must chain a bunch of ledgers together. BK leaves that to the user.

In the case of a writer crash, the next writer(i.e. the client adding
the next ledger to the chain) needs to run the recovery algorithm,
which finds the last entry which may possibly have been acknowledged
to the reader. It uses this last entry to mark the ledger as closed.
This "close" operation is similar to a truncate. Individual bookies in
the ensemble may have entries past this last entry. However, these
entries do not exist on enough bookies for the entry to have been
acknowledged as written, so they can be ignored.

For example, say you have a ledger A across 3 bookies, b1 and b2, and
being written to by writer w1, with ensemble 2, write quorum 2 and ack
quorum 2.

w1 crashes when the bookies have the following entries.

b1: e1
b2: e1, e2

The next writer, w2, could close this ledger at either e1 or e2. Both
are correct.
For e1, it would try to read the last entry from both b1 & b2, but
only b1 would reply. w2 would see that e1 is the last entry on b1 and
as ack quorum is 2, it no entry beyond e1 has been acknowledged to w1
(to acknowledge to the writer, acknowledgement must be received from
|ack quorum| bookies).
For e2, it would try to read the last entry from both b1 & b2, either
b2 or both would reply. If both replied w2 would see that e2 was
written by the client, but not acknowledged to w1. However, it is also
possible that only b2 replied, so w2 cannot divine whether e2 was
acknowledged to w1. In both cases, it's safe to take e2 as the last
entry. w2 ensures that e2 is replicated to |ack quorum| bookies, and
marks it as the end of the ledger.

The case where e1 was found to be the last ledger can be considered
similar to truncate.

-Ivan

Reply via email to