[jira] [Commented] (BOOKKEEPER-447) Bookie can fail to recover if index pages flushed before ledger flush acknowledged

Ivan Kelly (JIRA) Mon, 03 Dec 2012 06:38:00 -0800

    [ 
https://issues.apache.org/jira/browse/BOOKKEEPER-447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508765#comment-13508765
 ]


Ivan Kelly commented on BOOKKEEPER-447:
---------------------------------------

!perf.png!

The benchmark was run with bkvhbase benchmark [1]. Entry size was 100, each run 
was 5 minutes. For each number of ledgers I ran 3 times.

Graph is quite bumpy, but it shows that, using the attached patch, performance 
is actually better for 1, 10 & 10000 ledgers, and a bit worse for 100 & 1000 
ledgers. Previous tests have running against the complete bookie gives a max 
tpt of 108k[2] so all these numbers are much better. 

[1] https://github.com/ivankelly/bkvhbase
[2] Running against a complete bookie means having to write to WAL first. This 
will slow us down, as we lose a degree of batching
                
> Bookie can fail to recover if index pages flushed before ledger flush 
> acknowledged
> ----------------------------------------------------------------------------------
>
>                 Key: BOOKKEEPER-447
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-447
>             Project: Bookkeeper
>          Issue Type: Bug
>          Components: bookkeeper-server
>    Affects Versions: 4.2.0
>            Reporter: Yixue (Andrew) Zhu
>            Assignee: Yixue (Andrew) Zhu
>              Labels: patch
>             Fix For: 4.2.0, 4.1.1
>
>         Attachments: 
> 0001-BOOKKEEPER-447-LedgerCacheImpl-waits-on-semaphore-no.patch, 
> BOOKKEEPER-447.diff, perf.png
>
>
> Bookie index page steal (LedgerCacheImpl::grabCleanPage) can cause index file 
> to reflect unacknowledged entries (due to flushLedger). Suppose ledger and 
> entry fail to flush due to Bookkeeper server crash, it will cause ledger 
> recovery not able to use the bookie afterward, due to 
> InterleavedStorageLedger::getEntry throws IOException.
> If the ackSet bookies all experience this problem (DC environment), the 
> ledger will not be able to recover.
> The problem here essentially a violation of WAL. One reasonable fix is to 
> track ledger flush progress (either per-ledger entry, or per-topic message). 
> Do not flush index pages which tracks entries whose ledger (log) has not been 
> flushed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (BOOKKEEPER-447) Bookie can fail to recover if index pages flushed before ledger flush acknowledged

Reply via email to