[ 
https://issues.apache.org/jira/browse/BOOKKEEPER-447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13487613#comment-13487613
 ] 

Flavio Junqueira commented on BOOKKEEPER-447:
---------------------------------------------

bq. the proposal of forcing entry log to be flushed before index would work, 
though the data is force flushed unnecessarily.

[[email protected]] We already force it to disk, so there is no extra penalty. 
Check 
InterleavedLedgerStore.flush()->entryLogger.flush()->logChannel.flush(true)

bq. I think the problem here is ledger storage flushed before journal flushed.

[~hustlmsp] Agreed, and my proposal does not prevent us from flushing to the 
ledger device before we do it to the journal, but it makes sure that if we do, 
we won't get the IOException. This change involves no more code and we only 
need to swap the order, it is very simple.

bq. it volatiles the contract for a bookie server, who ack an entry after the 
entry has been committed to journal.

It does not violate the contract because I'm not suggesting the we ack after 
flushing to the ledger device. We keep acking only when it is persisted in the 
journal. 



                
> Bookie can fail to recover if index pages flushed before ledger flush 
> acknowledged
> ----------------------------------------------------------------------------------
>
>                 Key: BOOKKEEPER-447
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-447
>             Project: Bookkeeper
>          Issue Type: Bug
>          Components: bookkeeper-server
>    Affects Versions: 4.2.0
>            Reporter: Yixue (Andrew) Zhu
>            Assignee: Robin Dhamankar
>              Labels: patch
>             Fix For: 4.2.0
>
>
> Bookie index page steal (LedgerCacheImpl::grabCleanPage) can cause index file 
> to reflect unacknowledged entries (due to flushLedger). Suppose ledger and 
> entry fail to flush due to Bookkeeper server crash, it will cause ledger 
> recovery not able to use the bookie afterward, due to 
> InterleavedStorageLedger::getEntry throws IOException.
> If the ackSet bookies all experience this problem (DC environment), the 
> ledger will not be able to recover.
> The problem here essentially a violation of WAL. One reasonable fix is to 
> track ledger flush progress (either per-ledger entry, or per-topic message). 
> Do not flush index pages which tracks entries whose ledger (log) has not been 
> flushed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to