[
https://issues.apache.org/jira/browse/BOOKKEEPER-447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ivan Kelly updated BOOKKEEPER-447:
----------------------------------
Attachment: 0001-BOOKKEEPER-447-LedgerCacheImpl-waits-on-lock-object-.patch
New patch gets rid of the semaphore, and just uses notify()/wait(). The
semaphore was unnecessary as LedgerCacheImpl does its own bookkeeping on which
pages are clean and which are dirty.
Basically, if there are no clean pages available, we wait for 100 ms, and then
see if we can find any. It's kind of busy waiting, but the 100 ms will stop it
going into a tight loop.
> Bookie can fail to recover if index pages flushed before ledger flush
> acknowledged
> ----------------------------------------------------------------------------------
>
> Key: BOOKKEEPER-447
> URL: https://issues.apache.org/jira/browse/BOOKKEEPER-447
> Project: Bookkeeper
> Issue Type: Bug
> Components: bookkeeper-server
> Affects Versions: 4.2.0
> Reporter: Yixue (Andrew) Zhu
> Assignee: Ivan Kelly
> Labels: patch
> Fix For: 4.2.0, 4.1.1
>
> Attachments:
> 0001-BOOKKEEPER-447-LedgerCacheImpl-waits-on-lock-object-.patch,
> 0001-BOOKKEEPER-447-LedgerCacheImpl-waits-on-semaphore-no.patch,
> BOOKKEEPER-447.diff, perf.png
>
>
> Bookie index page steal (LedgerCacheImpl::grabCleanPage) can cause index file
> to reflect unacknowledged entries (due to flushLedger). Suppose ledger and
> entry fail to flush due to Bookkeeper server crash, it will cause ledger
> recovery not able to use the bookie afterward, due to
> InterleavedStorageLedger::getEntry throws IOException.
> If the ackSet bookies all experience this problem (DC environment), the
> ledger will not be able to recover.
> The problem here essentially a violation of WAL. One reasonable fix is to
> track ledger flush progress (either per-ledger entry, or per-topic message).
> Do not flush index pages which tracks entries whose ledger (log) has not been
> flushed.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira