[ 
https://issues.apache.org/jira/browse/BOOKKEEPER-447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13497285#comment-13497285
 ] 

Ivan Kelly commented on BOOKKEEPER-447:
---------------------------------------

I think the root problem here is that the entrylog must be flushed 
monolithically, while the index files are flushed individually. This means to 
clear up space for the index files, we need to flush the whole entrylog, or 
else get the problem described. This, at its core, is the same problem. 
Basically, if all entries are interleaved, then it's impossible for a bookie to 
flush all entries associated with an index page, without flushing everything 
around them.

I had been thinking of a solution for BOOKKEEPER-432, which is somewhat similar 
to Aniruddha's.

Basically, we have a SlabAllocator, which has blocks of memory, maybe 8k in 
size. Each ledger has two slabs, the entrylog and index slab. Entries for a 
ledger are written to the entrylog slab, and then the offset is written to the 
index slab. 

For a normal flush, we go through all ledgers, flush the entrylog slab (long 
sequential write), and then the index slab for each of them (using the offset 
from the entrylog flush to calculate the real offsets). 

For a "reclaim me some memory", we can flush a single entrylog slab, and then 
the index slab. Of course, in implementation it would be more complex, but the 
basic idea is that, for a single ledger, the entrylog segment is independent 
until the point that it is on the disk.

                
> Bookie can fail to recover if index pages flushed before ledger flush 
> acknowledged
> ----------------------------------------------------------------------------------
>
>                 Key: BOOKKEEPER-447
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-447
>             Project: Bookkeeper
>          Issue Type: Bug
>          Components: bookkeeper-server
>    Affects Versions: 4.2.0
>            Reporter: Yixue (Andrew) Zhu
>            Assignee: Robin Dhamankar
>              Labels: patch
>             Fix For: 4.2.0, 4.1.1
>
>         Attachments: BOOKKEEPER-447.diff
>
>
> Bookie index page steal (LedgerCacheImpl::grabCleanPage) can cause index file 
> to reflect unacknowledged entries (due to flushLedger). Suppose ledger and 
> entry fail to flush due to Bookkeeper server crash, it will cause ledger 
> recovery not able to use the bookie afterward, due to 
> InterleavedStorageLedger::getEntry throws IOException.
> If the ackSet bookies all experience this problem (DC environment), the 
> ledger will not be able to recover.
> The problem here essentially a violation of WAL. One reasonable fix is to 
> track ledger flush progress (either per-ledger entry, or per-topic message). 
> Do not flush index pages which tracks entries whose ledger (log) has not been 
> flushed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to