[
https://issues.apache.org/jira/browse/BOOKKEEPER-432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13489523#comment-13489523
]
Flavio Junqueira commented on BOOKKEEPER-432:
---------------------------------------------
I'm really glad you guys are working on this and it sounds like it will be a
nice addition to the project. I have a few comments about the proposal, and
hopefully they haven't been asked before. I skimmed through the comments and
haven't seen anything so here they are.
The design says that once a B-tree is created and complete, it will not be
updated. I'm wondering if this means that we accumulate entries in a skiplist
and once we fill up the buffer with entries, we organize the entries in a
B-tree and flush it. If so, I suppose that you're thinking of having two
buffers so that we can flush one while filling up the next. In general, my
concern here is impacting write throughput. The current design has this nice
feature that the throughput is actually limited by the speed of the journal and
not the speed of the ledger store (in the absence of reads).
I'm also wondering how compaction will affect overall performance. My current
intuition is that it shouldn't be much more expensive than the current
compaction we have, but I'm interested in your opinion.
> Improve performance of entry log range read per ledger entries
> ---------------------------------------------------------------
>
> Key: BOOKKEEPER-432
> URL: https://issues.apache.org/jira/browse/BOOKKEEPER-432
> Project: Bookkeeper
> Issue Type: Improvement
> Components: bookkeeper-server
> Affects Versions: 4.2.0
> Environment: Linux
> Reporter: Yixue (Andrew) Zhu
> Assignee: Yixue (Andrew) Zhu
> Labels: patch
> Attachments: BookieLedgerStorageProposal.pdf
>
>
> We observed random I/O reads when some subscribers fall behind (on some
> topics), as delivery needs to scan the entry logs (thru ledger index), which
> are interleaved with ledger entries across all ledgers being served.
> Essentially, the ledger index is a non-clustered index. It is not effective
> when a large number of ledger entries need to be served, which tend to be
> scattered around due to interleaving.
> Some possible improvements:
> 1. Change the ledger entries buffer to use a SkipList (or other suitable),
> sorted on (ledger, entry sequence). When the buffer is flushed, the entry log
> is written out in the already-sorted order.
> The "active" ledger index can point to the entries buffer (SkipList), and
> fixed up with entry-log position once latter is persisted.
> Or, the ledger index can be just rebuilt on demand. The entry log file tail
> can have index attached (light-weight b-tree, similar with big-table). We
> need to track per ledger which log files contribute entries to it, so that
> in-memory index can be rebuilt from the tails of corresponding log files.
> 2. Use affinity concept to make ensembles of ledgers (belonging to same
> topic) as identical as possible. This will help above 1. be more effective.
>
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira