[
https://issues.apache.org/jira/browse/HBASE-3484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756557#comment-13756557
]
Anoop Sam John commented on HBASE-3484:
---------------------------------------
Trying out some thing like how there can be multiple HFiles within a store.
Within a memstore there can be more than one KeyValueSkipListSet object at a
time (and so CSLM)
For each of the KeyValueSkipListSet slice there is a configurable max size .
Initially there will be only one KeyValueSkipListSet in the Memstore. Once the
size reaches the threshold, we will create another KeyValueSkipListSet (So a
new CSLM) and new KVs are inserted into this. The old datastructure wont get
KVs again. So within *one KeyValueSkipListSet* KVs will be sorted. This
continues and finally all these KeyValueSkipListSets are taken in to Snapshots
and written to HFile. We will need changes in the MemstoreScanner so as to
consider this as a heap and emit KVs in the correct order.
Once the flush is over again there will be only one KeyValueSkipListSet in a
memstore and this continues. Basically trying to avoid a single CSLM to grow
to very big size with more #entries.
By default there is no max size for a slice so single CSLM becoming bigger as
long as KVs are inserted into memstore before a flush.
Done a POC and tested also. The initial test with LoadTestTool shows that we
can avoid the decrease in throughput with size of the memstore. Will attach a
patch with this change by this weekend.
> Replace memstore's ConcurrentSkipListMap with our own implementation
> --------------------------------------------------------------------
>
> Key: HBASE-3484
> URL: https://issues.apache.org/jira/browse/HBASE-3484
> Project: HBase
> Issue Type: Improvement
> Components: Performance
> Affects Versions: 0.92.0
> Reporter: Todd Lipcon
> Priority: Critical
> Attachments: hierarchical-map.txt, memstore_drag.png
>
>
> By copy-pasting ConcurrentSkipListMap into HBase we can make two improvements
> to it for our use case in MemStore:
> - add an iterator.replace() method which should allow us to do upsert much
> more cheaply
> - implement a Set directly without having to do Map<KeyValue,KeyValue> to
> save one reference per entry
> It turns out CSLM is in public domain from its development as part of JSR
> 166, so we should be OK with licenses.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira