[ 
https://issues.apache.org/jira/browse/HBASE-3484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13756557#comment-13756557
 ] 

Anoop Sam John commented on HBASE-3484:
---------------------------------------

Trying out some thing like how there can be multiple HFiles within a store. 
Within a memstore there can be more than one KeyValueSkipListSet object at a 
time (and so CSLM)
For each of the KeyValueSkipListSet slice there is a configurable max size . 
Initially there will be only one KeyValueSkipListSet in the Memstore. Once the 
size reaches the threshold, we will create another KeyValueSkipListSet (So a 
new CSLM) and new KVs are inserted into this.  The old datastructure wont get 
KVs again. So within *one KeyValueSkipListSet*  KVs will be sorted. This 
continues and finally all these KeyValueSkipListSets are taken in to Snapshots 
and written to HFile.  We will need changes in the MemstoreScanner so as to 
consider this as a heap and emit KVs in the correct order.
Once the flush is over again there will be only one KeyValueSkipListSet in a 
memstore and this continues.   Basically trying to avoid a single CSLM to grow 
to very big size with more #entries.

By default there is no max size for a slice so single CSLM becoming bigger as 
long as KVs are inserted into memstore before a flush.

Done a POC and tested also. The initial test with LoadTestTool shows that we 
can avoid the decrease in throughput with size of the memstore.  Will attach a 
patch with this change by this weekend.
                
> Replace memstore's ConcurrentSkipListMap with our own implementation
> --------------------------------------------------------------------
>
>                 Key: HBASE-3484
>                 URL: https://issues.apache.org/jira/browse/HBASE-3484
>             Project: HBase
>          Issue Type: Improvement
>          Components: Performance
>    Affects Versions: 0.92.0
>            Reporter: Todd Lipcon
>            Priority: Critical
>         Attachments: hierarchical-map.txt, memstore_drag.png
>
>
> By copy-pasting ConcurrentSkipListMap into HBase we can make two improvements 
> to it for our use case in MemStore:
> - add an iterator.replace() method which should allow us to do upsert much 
> more cheaply
> - implement a Set directly without having to do Map<KeyValue,KeyValue> to 
> save one reference per entry
> It turns out CSLM is in public domain from its development as part of JSR 
> 166, so we should be OK with licenses.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to