[ 
https://issues.apache.org/jira/browse/BOOKKEEPER-447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sijie Guo updated BOOKKEEPER-447:
---------------------------------

    Attachment: BOOKKEEPER-447_bitset.diff

a draft patch based on previous discussion's idea using a BitSet to track the 
sync status of its journal entries. only flush a dirty page when the BitSet is 
empty (all entries in this page are synced to journal).

why BitSet? it is a bit trouble is although we adding entries in order, but 
this order is preserved by client not a bookie. the entries added in a bookie 
would be in any order due to retry and change ensemble logic. so it is not safe 
to use something like last entry id to track the progress of committing a 
ledger's entries to journal.

Also this patch improved ledger flushing to prevent updating a ledger page when 
flushing it. updating a ledger page when flushing, which would cause unsynced 
journal entry's index is flushed. this is an already existed bug in current 
ledger flush, which would cause this issue even not force flush ledger when 
grabbing a clean page.

this patch passed existed test cases. I haven't added test case for it. but I 
think we might need to cover more cases when adding tests.


                
> Bookie can fail to recover if index pages flushed before ledger flush 
> acknowledged
> ----------------------------------------------------------------------------------
>
>                 Key: BOOKKEEPER-447
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-447
>             Project: Bookkeeper
>          Issue Type: Bug
>          Components: bookkeeper-server
>    Affects Versions: 4.2.0
>            Reporter: Yixue (Andrew) Zhu
>            Assignee: Ivan Kelly
>             Fix For: 4.2.0, 4.1.1
>
>         Attachments: 
> 0001-BOOKKEEPER-447-LedgerCacheImpl-waits-on-lock-object-.patch, 
> 0001-BOOKKEEPER-447-LedgerCacheImpl-waits-on-lock-object-.patch, 
> 0001-BOOKKEEPER-447-LedgerCacheImpl-waits-on-semaphore-no.patch, 
> 0001-BOOKKEEPER-447-Throw-NoSuchEntry-if-entry-is-not-fou.patch, 
> BOOKKEEPER-447_bitset.diff, BOOKKEEPER-447.diff, 
> BOOKKEEPER-447_force_flush_entry_logger.patch, perf.png
>
>
> Bookie index page steal (LedgerCacheImpl::grabCleanPage) can cause index file 
> to reflect unacknowledged entries (due to flushLedger). Suppose ledger and 
> entry fail to flush due to Bookkeeper server crash, it will cause ledger 
> recovery not able to use the bookie afterward, due to 
> InterleavedStorageLedger::getEntry throws IOException.
> If the ackSet bookies all experience this problem (DC environment), the 
> ledger will not be able to recover.
> The problem here essentially a violation of WAL. One reasonable fix is to 
> track ledger flush progress (either per-ledger entry, or per-topic message). 
> Do not flush index pages which tracks entries whose ledger (log) has not been 
> flushed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to