[
https://issues.apache.org/jira/browse/HBASE-14708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14992171#comment-14992171
]
Elliott Clark commented on HBASE-14708:
---------------------------------------
bq.It is quite confusing that some of methods provide copies, and some of
methods expose the internal data that the client should not modify.
That's only for tree map. The array based version has non-modifiable versions
of everything. The tree map is less risky as it relies on a know and tested
version of map. However it's pretty specialized and should only really ever be
used in meta cache.
The array based map is faster and more complete. However it is also more risky.
My thought was that in branch-1.2 we would use the tree map and in branch-1 +
we would use the array map.
> Use copy on write Map for region location cache
> -----------------------------------------------
>
> Key: HBASE-14708
> URL: https://issues.apache.org/jira/browse/HBASE-14708
> Project: HBase
> Issue Type: Improvement
> Components: Client
> Affects Versions: 1.1.2
> Reporter: Elliott Clark
> Assignee: Elliott Clark
> Priority: Critical
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: HBASE-14708-v10.patch, HBASE-14708-v11.patch,
> HBASE-14708-v12.patch, HBASE-14708-v2.patch, HBASE-14708-v3.patch,
> HBASE-14708-v4.patch, HBASE-14708-v5.patch, HBASE-14708-v6.patch,
> HBASE-14708-v7.patch, HBASE-14708-v8.patch, HBASE-14708-v9.patch,
> HBASE-14708.patch, anotherbench.zip, location_cache_times.pdf, result.csv
>
>
> Internally a co-worker profiled their application that was talking to HBase.
> > 60% of the time was spent in locating a region. This was while the cluster
> was stable and no regions were moving.
> To figure out if there was a faster way to cache region location I wrote up a
> benchmark here: https://github.com/elliottneilclark/benchmark-hbase-cache
> This tries to simulate a heavy load on the location cache.
> * 24 different threads.
> * 2 Deleting location data
> * 2 Adding location data
> * Using floor to get the result.
> To repeat my work just run ./run.sh and it should produce a result.csv
> Results:
> ConcurrentSkiplistMap is a good middle ground. It's got equal speed for
> reading and writing.
> However most operations will not need to remove or add a region location.
> There will be potentially several orders of magnitude more reads for cached
> locations than there will be on clearing the cache.
> So I propose a copy on write tree map.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)