[ 
https://issues.apache.org/jira/browse/BOOKKEEPER-432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13488199#comment-13488199
 ] 

Ivan Kelly commented on BOOKKEEPER-432:
---------------------------------------

Are the intermediatory levels of the btree between the root and the leaves in 
the btree stored on disk, or is it only the leaves? Do you maintain separate 
index files per ledger also? Sorry for all the questions, I'm trying to get the 
full pivture in my head :)

{quote}
If you mean Ledger Log by write-ahead-log, yes, we will not read them unless 
during recovery. I am not proposing caching it.

By entry-log, I mean entry data (messages), which is stored in data blocks (aka 
pages, chunks). These data blocks can be cached on demand, using LRU 
replacement policy. 
{quote}
By write ahead log, I mean that the usecase BK, as a whole is designed for, is 
as a write ahead log, which, by it's nature should be seldom read. I'm not 
refering to the bookie's own journal. I could imagine one usecases where 
multiple reads could be necessary (such as hedwig, with many subscribers, 
consuming at different rates) but these should be handled at a higher level 
(such as the read ahead cache in hedwig).
                
> Improve performance of entry log range read per ledger entries 
> ---------------------------------------------------------------
>
>                 Key: BOOKKEEPER-432
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-432
>             Project: Bookkeeper
>          Issue Type: Improvement
>          Components: bookkeeper-server
>    Affects Versions: 4.2.0
>         Environment: Linux
>            Reporter: Yixue (Andrew) Zhu
>            Assignee: Yixue (Andrew) Zhu
>              Labels: patch
>         Attachments: BookieLedgerStorageProposal.pdf
>
>
> We observed random I/O reads when some subscribers fall behind (on some 
> topics), as delivery needs to scan the entry logs (thru ledger index), which 
> are interleaved with ledger entries across all ledgers being served.
> Essentially, the ledger index is a non-clustered index. It is not effective 
> when a large number of ledger entries need to be served, which tend to be 
> scattered around due to interleaving.
> Some possible improvements:
> 1. Change the ledger entries buffer to use a SkipList (or other suitable), 
> sorted on (ledger, entry sequence). When the buffer is flushed, the entry log 
> is written out in the already-sorted order. 
> The "active" ledger index can point to the entries buffer (SkipList), and 
> fixed up with entry-log position once latter is persisted.
> Or, the ledger index can be just rebuilt on demand. The entry log file tail 
> can have index attached (light-weight b-tree, similar with big-table). We 
> need to track per ledger which log files contribute entries to it, so that 
> in-memory index can be rebuilt from the tails of corresponding log files.
> 2. Use affinity concept to make ensembles of ledgers (belonging to same 
> topic) as identical as possible. This will help above 1. be more effective.
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to