[jira] [Commented] (BOOKKEEPER-432) Improve performance of entry log range read per ledger entries

Ivan Kelly (JIRA) Wed, 15 May 2013 01:55:26 -0700

    [ 
https://issues.apache.org/jira/browse/BOOKKEEPER-432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13658177#comment-13658177
 ]


Ivan Kelly commented on BOOKKEEPER-432:
---------------------------------------

The problem stems from the change in entry logger, where on an addEntry, the 
InterleavedLedgerStorage will roll if a limit is reached, but Sorted will not. 
So the number of log files, which is what the test checks, will be different. 
So it's natural enough that it will run fine if we just disable the sorted 
storage for LedgerDeleteTest. I think the other test fix from that changelist 
is already in anyhow.

For the 4.3.0 release, I'm going to disable sorted storage by default. But I do 
want some tests to run for the new ledger storage also. I'll try to pick out a 
subset I want to run, but actually, that will also be a big patch, so I may do 
it in a separate jira.
                
> Improve performance of entry log range read per ledger entries 
> ---------------------------------------------------------------
>
>                 Key: BOOKKEEPER-432
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-432
>             Project: Bookkeeper
>          Issue Type: Improvement
>          Components: bookkeeper-server
>    Affects Versions: 4.2.0
>         Environment: Linux
>            Reporter: Yixue (Andrew) Zhu
>            Assignee: Yixue (Andrew) Zhu
>              Labels: patch
>             Fix For: 4.3.0
>
>         Attachments: 0001-BOOKKEEPER-432-First-pass.patch, 
> BookieLedgerStorageProposal.pdf, PortSkipListLedgerStore.patch
>
>
> We observed random I/O reads when some subscribers fall behind (on some 
> topics), as delivery needs to scan the entry logs (thru ledger index), which 
> are interleaved with ledger entries across all ledgers being served.
> Essentially, the ledger index is a non-clustered index. It is not effective 
> when a large number of ledger entries need to be served, which tend to be 
> scattered around due to interleaving.
> Some possible improvements:
> 1. Change the ledger entries buffer to use a SkipList (or other suitable), 
> sorted on (ledger, entry sequence). When the buffer is flushed, the entry log 
> is written out in the already-sorted order. 
> The "active" ledger index can point to the entries buffer (SkipList), and 
> fixed up with entry-log position once latter is persisted.
> Or, the ledger index can be just rebuilt on demand. The entry log file tail 
> can have index attached (light-weight b-tree, similar with big-table). We 
> need to track per ledger which log files contribute entries to it, so that 
> in-memory index can be rebuilt from the tails of corresponding log files.
> 2. Use affinity concept to make ensembles of ledgers (belonging to same 
> topic) as identical as possible. This will help above 1. be more effective.
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (BOOKKEEPER-432) Improve performance of entry log range read per ledger entries

Reply via email to