[
https://issues.apache.org/jira/browse/HBASE-3484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13214179#comment-13214179
]
Todd Lipcon commented on HBASE-3484:
------------------------------------
bq. How you think Todd? Because of the tiering cost more or is it something to
do w/ mslab allocations?
Extra container costs with the fact that we have extra CSLM objects for each
row. I haven't measured but I bet there is some map-wide overhead that we're
paying.
There are some other things I noticed that could be improved, though. In
particular, CSLM optimizes for Comparable keys, so if you specify a custom
comparator, then it has to wrap every key you insert with a wrapper object.
Specializing CSLM for our purposes would easily save 64 bytes per entry on this.
Another thought I had was to do the following:
- have the actual entries in rowMap be Object-typed, rather than CSLMs.
- when the first insert happens, just insert the KeyValue itself (optimization
for the case where each row has only one cell)
- when more inserts happen, swap it out for a proper container type
The proper container type's also interesting to consider here. We never have
contention on update within a row, since the updates happen under a row lock,
right? So, we can consider any map type that supports single-writer
multiple-reader efficiently, which is a wider range of data structures than
support multi-writer multi-reader. One possibility is snap trees or even
copy-on-write sorted array lists.
bq. What would you like to see test-wise proving this direction better than
what we currently have? I could work up some tests?
Would be great if we had a benchmark focused on memstore-only which allowed a
mix of the following operations from different threads:
- full scans
- range scans
- updates to existing rows which just touch 1 or a few columns
- updates to existing rows which touch lots of columns
- inserts of new rows (few or lots of columns)
But it's a bit of work to do all that. So, a microbenchmark which just timed
something like having 20 threads each do a bunch of inserts with multi-column
rows would at least show whether there's promise here.
> Replace memstore's ConcurrentSkipListMap with our own implementation
> --------------------------------------------------------------------
>
> Key: HBASE-3484
> URL: https://issues.apache.org/jira/browse/HBASE-3484
> Project: HBase
> Issue Type: Improvement
> Components: performance
> Affects Versions: 0.92.0
> Reporter: Todd Lipcon
> Priority: Critical
> Attachments: hierarchical-map.txt
>
>
> By copy-pasting ConcurrentSkipListMap into HBase we can make two improvements
> to it for our use case in MemStore:
> - add an iterator.replace() method which should allow us to do upsert much
> more cheaply
> - implement a Set directly without having to do Map<KeyValue,KeyValue> to
> save one reference per entry
> It turns out CSLM is in public domain from its development as part of JSR
> 166, so we should be OK with licenses.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira