Yixue (Andrew) Zhu created BOOKKEEPER-447:
---------------------------------------------
Summary: Bookie can fail to recover if index pages flushed before
ledger flush acknowledged
Key: BOOKKEEPER-447
URL: https://issues.apache.org/jira/browse/BOOKKEEPER-447
Project: Bookkeeper
Issue Type: Bug
Components: bookkeeper-server
Affects Versions: 4.2.0
Reporter: Yixue (Andrew) Zhu
Fix For: 4.2.0
Bookie index page steal (LedgerCacheImpl::grabCleanPage) can cause index file
to reflect unacknowledged entries (due to flushLedger). Suppose ledger and
entry fail to flush due to Bookkeeper server crash, it will cause ledger
recovery not able to use the bookie afterward, due to
InterleavedStorageLedger::getEntry throws IOException.
If the ackSet bookies all experience this problem (DC environment), the ledger
will not be able to recover.
The problem here essentially a violation of WAL. One reasonable fix is to track
ledger flush progress (either per-ledger entry, or per-topic message). Do not
flush index pages which tracks entries whose ledger (log) has not been flushed.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira