[
https://issues.apache.org/jira/browse/BOOKKEEPER-447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13487755#comment-13487755
]
Sijie Guo commented on BOOKKEEPER-447:
--------------------------------------
Revisited the steps of adding entry in bookie server:
1) add entry to ledger storage; (add to entry logger then update ledger index
entry)
2) add entry to journal queue;
3) journal thread flushes journal queue to commit entry to disks.
4) in journal's adding callback, it respond to client.
so the entry is available for read after step 1) even the entry is not
committed to journal. This behavior is OK for BookKeeper since there was last
confirmed hint guarantee in BookKeeper.
But it was not so safe to make an entry to be available for read before
committing to journal. Imaging that it was K/V storage (not bookkeeper), it
first adding a key to memory for read then commit to journal for persistence.
After the key is in memory, which is readable to client, client would read the
value of key. But if crashed happend before committing to journal, the storage
restarts and the key is gone. client would not read the key again, which causes
inconsistent state.
A better sequence for adding entry for a journal-based storage would be:
1) added to journal queue first
2) journal thread committed the add operation to journal
3) in the callback of adding entry to journal, it put addEntry operation in a
writer thread's queue.
4) the write thread adds entry to ledger storage.
5) respond to client.
In such sequence, we just make the entry available for read only after it was
safely committed to disk. It would avoid inconsistent state as described above
and also address this issue here.
Performance consideration:
the original steps: the latency of an addEntry operation would be (latency of
adding entry to ledger storage) + (latency of committing entry to journal).
the changed steps: the latency of an addEntry operation would be (latency of
committing entry to journal) + (latency of adding entry to ledger storage).
Since we don't add entry to ledger storage directly in the callback committing
entry to journal, we just put the addEntry operation in a writer's thread (as
improvement introduced in BOOKKEEPER-429), the latency of committing entry is
still same as the original one. so the total latency of an addEntry operation
remains same.
Complexity:
it just needs to change the order of adding entry, which doesn't introduce any
other code. (I assumed that we would have a separated write thread and have a
queue for those pending addEntry operations, which would be introduced in
BOOKKEEPER-429). And the benefit of this change would make the behavior
predicate even encountering crashes.
> Bookie can fail to recover if index pages flushed before ledger flush
> acknowledged
> ----------------------------------------------------------------------------------
>
> Key: BOOKKEEPER-447
> URL: https://issues.apache.org/jira/browse/BOOKKEEPER-447
> Project: Bookkeeper
> Issue Type: Bug
> Components: bookkeeper-server
> Affects Versions: 4.2.0
> Reporter: Yixue (Andrew) Zhu
> Assignee: Robin Dhamankar
> Labels: patch
> Fix For: 4.2.0
>
>
> Bookie index page steal (LedgerCacheImpl::grabCleanPage) can cause index file
> to reflect unacknowledged entries (due to flushLedger). Suppose ledger and
> entry fail to flush due to Bookkeeper server crash, it will cause ledger
> recovery not able to use the bookie afterward, due to
> InterleavedStorageLedger::getEntry throws IOException.
> If the ackSet bookies all experience this problem (DC environment), the
> ledger will not be able to recover.
> The problem here essentially a violation of WAL. One reasonable fix is to
> track ledger flush progress (either per-ledger entry, or per-topic message).
> Do not flush index pages which tracks entries whose ledger (log) has not been
> flushed.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira