[ 
https://issues.apache.org/jira/browse/BOOKKEEPER-447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13487824#comment-13487824
 ] 

Robin Dhamankar commented on BOOKKEEPER-447:
--------------------------------------------

Sijie, Not reading data that has not been persisted can be achieved without 
having to delay inserting to the index or the log files. WAL enforcement would 
associate a monotonically increasing sequence number with each batch of queue 
entries that are written to the journal and use this timestamp to detect if all 
entries in an index page have already been persisted. The same check that is 
used before the index is persisted can be used when the index is read if we 
want to provide readers isolation from data that is persisted. In the common 
case, the journal flushes will be ahead of the subscriber consumption (read) so 
we will basically not introduce any overhead. 

Flavio, I dont think we want to couple these with 429 and 432. Those are 
performance optimizations, this is correctness.   
                
> Bookie can fail to recover if index pages flushed before ledger flush 
> acknowledged
> ----------------------------------------------------------------------------------
>
>                 Key: BOOKKEEPER-447
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-447
>             Project: Bookkeeper
>          Issue Type: Bug
>          Components: bookkeeper-server
>    Affects Versions: 4.2.0
>            Reporter: Yixue (Andrew) Zhu
>            Assignee: Robin Dhamankar
>              Labels: patch
>             Fix For: 4.2.0
>
>
> Bookie index page steal (LedgerCacheImpl::grabCleanPage) can cause index file 
> to reflect unacknowledged entries (due to flushLedger). Suppose ledger and 
> entry fail to flush due to Bookkeeper server crash, it will cause ledger 
> recovery not able to use the bookie afterward, due to 
> InterleavedStorageLedger::getEntry throws IOException.
> If the ackSet bookies all experience this problem (DC environment), the 
> ledger will not be able to recover.
> The problem here essentially a violation of WAL. One reasonable fix is to 
> track ledger flush progress (either per-ledger entry, or per-topic message). 
> Do not flush index pages which tracks entries whose ledger (log) has not been 
> flushed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to