[ 
https://issues.apache.org/jira/browse/HBASE-3484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13214179#comment-13214179
 ] 

Todd Lipcon commented on HBASE-3484:
------------------------------------

bq. How you think Todd? Because of the tiering cost more or is it something to 
do w/ mslab allocations?

Extra container costs with the fact that we have extra CSLM objects for each 
row. I haven't measured but I bet there is some map-wide overhead that we're 
paying.

There are some other things I noticed that could be improved, though. In 
particular, CSLM optimizes for Comparable keys, so if you specify a custom 
comparator, then it has to wrap every key you insert with a wrapper object. 
Specializing CSLM for our purposes would easily save 64 bytes per entry on this.

Another thought I had was to do the following:
- have the actual entries in rowMap be Object-typed, rather than CSLMs.
- when the first insert happens, just insert the KeyValue itself (optimization 
for the case where each row has only one cell)
- when more inserts happen, swap it out for a proper container type

The proper container type's also interesting to consider here. We never have 
contention on update within a row, since the updates happen under a row lock, 
right? So, we can consider any map type that supports single-writer 
multiple-reader efficiently, which is a wider range of data structures than 
support multi-writer multi-reader. One possibility is snap trees or even 
copy-on-write sorted array lists.

bq. What would you like to see test-wise proving this direction better than 
what we currently have? I could work up some tests?
Would be great if we had a benchmark focused on memstore-only which allowed a 
mix of the following operations from different threads:
- full scans
- range scans
- updates to existing rows which just touch 1 or a few columns
- updates to existing rows which touch lots of columns
- inserts of new rows (few or lots of columns)

But it's a bit of work to do all that. So, a microbenchmark which just timed 
something like having 20 threads each do a bunch of inserts with multi-column 
rows would at least show whether there's promise here.
                
> Replace memstore's ConcurrentSkipListMap with our own implementation
> --------------------------------------------------------------------
>
>                 Key: HBASE-3484
>                 URL: https://issues.apache.org/jira/browse/HBASE-3484
>             Project: HBase
>          Issue Type: Improvement
>          Components: performance
>    Affects Versions: 0.92.0
>            Reporter: Todd Lipcon
>            Priority: Critical
>         Attachments: hierarchical-map.txt
>
>
> By copy-pasting ConcurrentSkipListMap into HBase we can make two improvements 
> to it for our use case in MemStore:
> - add an iterator.replace() method which should allow us to do upsert much 
> more cheaply
> - implement a Set directly without having to do Map<KeyValue,KeyValue> to 
> save one reference per entry
> It turns out CSLM is in public domain from its development as part of JSR 
> 166, so we should be OK with licenses.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to