[ 
https://issues.apache.org/jira/browse/BOOKKEEPER-432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13489536#comment-13489536
 ] 

Yixue (Andrew) Zhu commented on BOOKKEEPER-432:
-----------------------------------------------

Thanks Ivan for the graph. W.r.t. the write throughput, does the graph reflect 
compaction in HBase? Are the regions hosted in one machine? It reinforces my 
argument of not using HBase or LevelDb as it is. 

The proposal will not use HBase's compaction or multi-region design, so the 
write performance cannot be reflected in the graph.

Re - "So all entries for a single ledger will be stored sequentially on disk in 
a single file."
Not all entries. Some clustered entries store in one file, the next clustered 
entries in another file.

The idea of not storing index entries offset is that the offset is not 
finalized until the SkipList is flushed. Using on-demand index entry page cache 
should get the read-performance on-par.

Of course, we can pin the index entries in-memory, until the SkipList is 
flushed.
I can go with this approach. Does it address your concern, Ivan?



 

  
                
> Improve performance of entry log range read per ledger entries 
> ---------------------------------------------------------------
>
>                 Key: BOOKKEEPER-432
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-432
>             Project: Bookkeeper
>          Issue Type: Improvement
>          Components: bookkeeper-server
>    Affects Versions: 4.2.0
>         Environment: Linux
>            Reporter: Yixue (Andrew) Zhu
>            Assignee: Yixue (Andrew) Zhu
>              Labels: patch
>         Attachments: BookieLedgerStorageProposal.pdf
>
>
> We observed random I/O reads when some subscribers fall behind (on some 
> topics), as delivery needs to scan the entry logs (thru ledger index), which 
> are interleaved with ledger entries across all ledgers being served.
> Essentially, the ledger index is a non-clustered index. It is not effective 
> when a large number of ledger entries need to be served, which tend to be 
> scattered around due to interleaving.
> Some possible improvements:
> 1. Change the ledger entries buffer to use a SkipList (or other suitable), 
> sorted on (ledger, entry sequence). When the buffer is flushed, the entry log 
> is written out in the already-sorted order. 
> The "active" ledger index can point to the entries buffer (SkipList), and 
> fixed up with entry-log position once latter is persisted.
> Or, the ledger index can be just rebuilt on demand. The entry log file tail 
> can have index attached (light-weight b-tree, similar with big-table). We 
> need to track per ledger which log files contribute entries to it, so that 
> in-memory index can be rebuilt from the tails of corresponding log files.
> 2. Use affinity concept to make ensembles of ledgers (belonging to same 
> topic) as identical as possible. This will help above 1. be more effective.
>  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to