> No you cannot miss data, if the client is not able to find a bookie that
is
> able to answer with the entry it receives an error.

Let's say we have WQ 3 and AQ 2. An add (e100) has reached AQ and the
confirm callback to the client is called and the LAC is set to 100. Now the
3rd bookie times out. Ensemble change is executed and all pending adds that
are above the LAC of 100 are replayed to another bookie, meaning that the
entry e100 is not replayed to another bookie, causing this entry to meet
the rep factor of only AQ. This is alluded to in the docs as they state
that AQ is also the minimum guaranteed replication factor.

> The recovery read fails if it is not possible to read every entry from at
> least AQ bookies  (that is it allows WQ-QA read failures),
> and it does not hazard to "repair" (truncate) the ledger if it does not
> find enough bookies.

This is not quite accurate. A single successful read is enough. However
there is an odd behaviour in that as soon as (WQ-AQ)+1 reads fail with
explicit NoSuchEntry/Ledger, the read is considered failed and the ledger
recovery process ends there. This means that given the responses
b1->success, b2->NoSuchLedger, b3->NoSuchLedger, whether the read is
considered successful is non-deterministic. If the response from b1 is
received last, then the read is already considered failed, otherwise the
read succeeds.

I have come to the above conclusions through my reverse engineering process
for creating the TLA+ specification. I still have pending to
reproduce the AQ rep factor behaviour via some tests, but have verified via
tests the conclusion about ledger recovery reads.

Note that I have found two defects with the BookKeeper protocol, most
notably data loss due to that fencing does not prevent further successful
adds. Currently the specification and associated documentation is on a
private Splunk repo, but I'd be happy to set up a meeting to discuss the
spec and its findings.

Best
Jack


On Fri, Jan 15, 2021 at 11:51 AM Enrico Olivelli <eolive...@gmail.com>
wrote:

> [ External sender. Exercise caution. ]
>
> Jonathan,
>
> Il giorno gio 14 gen 2021 alle ore 20:57 Jonathan Ellis <
> jbel...@apache.org>
> ha scritto:
>
> > On 2021/01/11 08:31:03, Jack Vanlightly wrote:
> > > Hi,
> > >
> > > I've recently modelled the BookKeeper protocol in TLA+ and can confirm
> > that
> > > once confirmed, that an entry is not replayed to another bookie. This
> > > leaves a "hole" as the entry is now replicated only to 2 bookies,
> > however,
> > > the new data integrity check that Ivan worked on, when run periodically
> > > will be able to repair that hole.
> >
> > Can I read from the bookie with a hole in the meantime, and silently miss
> > data that it doesn't know about?
> >
>
> No you cannot miss data, if the client is not able to find a bookie that is
> able to answer with the entry it receives an error.
>
> The ledger has a known tail (LastAddConfirmed entry) and this value is
> stored on ledger metadata once the ledger is "closed".
>
> When the ledger is still open, that is when the writer is writing to it,
> the reader is allowed to read only up to the LastAddConfirmed entry
> this LAC value is returned to the reader using a piggyback mechanism,
> without reading from metadata.
> The reader cannot read beyond the latest position that has been confirmed
> to the writer by AQ bookies.
>
> We have a third case, the 'recovery read'.
> A reader starts a "recovery read" when you want to recover a ledger that
> has been abandoned by a dead writer
> or when you are a new leader (Pulsar Bundle Owner) or you want to fence out
> the old leader.
> In this case the reader merges the current status of the ledger on ZK with
> the result of a scan of the whole ledger.
> Basically it reads the ledger from the beginning up to the tail, until it
> is able to "read" entries, this way it is setting the 'fenced' flag on the
> ledger
> on every bookie and also it is able to detect the actual tail of the ledger
> (because the writer died and it was not able to flush metadata to ZK).
>
> The recovery read fails if it is not possible to read every entry from at
> least AQ bookies  (that is it allows WQ-QA read failures),
> and it does not hazard to "repair" (truncate) the ledger if it does not
> find enough bookies.
>
> I hope that helps
> Enrico
>

Reply via email to