[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap

Robert Stupp (JIRA) Thu, 06 Aug 2015 00:27:39 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14659619#comment-14659619
 ]


Robert Stupp commented on CASSANDRA-9738:
-----------------------------------------

I’d like to propose this patch to be included in 3.0. I hope the cstar tests 
are sufficient but otherwise I can deliver more with different workloads.

h2. cstar tests

All cstar tests mentioned below perform three operations: write-only, mixed and 
read-only.
Unfortunately, cassandra-stress seems to reduce the really possible write 
throughput for workloads with clustering keys.

All tests on this patch show reduced GC pressure (for reads, of course).
By that it gives G1 more headroom to operate and and often gains about 10-15% 
read perf improvement depending on the hardware (in this case bdplab vs. 
blade_11_b) - bdplab (spinning disks, less RAM) shows a bigger improvement.

h3. one big clustering key

user, native, cql3, user 
[profile|https://gist.github.com/snazy/b6c160c65001eb074784]

[blade_11_b|http://cstar.datastax.com/tests/id/7f7265a2-3aee-11e5-b022-42010af0688f]
[bdplab|http://cstar.datastax.com/tests/id/af344e3e-3af0-11e5-b379-42010af0688f]

h3. big clustering over two clustering columns

user, native, cql3, user 
[profile|https://gist.github.com/snazy/351156424929d868baf3]

[blade_11_b|http://cstar.datastax.com/tests/id/e919725a-3b68-11e5-b590-42010af0688f]
[bdplab|http://cstar.datastax.com/tests/id/36f1d0ee-3b8c-11e5-9c9e-42010af0688f]

h3. big clustering over two clustering columns, reduced threads for pure-write 
and mixed operations

user, native, cql3, user 
[profile|https://gist.github.com/snazy/e4579499f61911802fcd]

[blade_11_b|http://cstar.datastax.com/tests/id/36f1d0ee-3b8c-11e5-9c9e-42010af0688f]
[bdplab|http://cstar.datastax.com/tests/id/07754e44-3b8d-11e5-9c9e-42010af0688f]

h3. stress _write_, _mixed_, _read_

[blade_11_b|http://cstar.datastax.com/tests/id/def04c20-3b8d-11e5-9c9e-42010af0688f]
[bdplab|http://cstar.datastax.com/tests/id/f3f5c172-3b8d-11e5-9c9e-42010af0688f]

h2. Git branch + cassci

[git branch|https://github.com/snazy/cassandra/tree/9738-key-cache-ohc]
[unit 
tests|http://cassci.datastax.com/view/Dev/view/snazy/job/snazy-9738-key-cache-ohc-testall/]
[dtests|http://cassci.datastax.com/view/Dev/view/snazy/job/snazy-9738-key-cache-ohc-dtest/]

I didn’t see any failed tests related to this patch.

There is another branch on github as well which contains [optimizations not 
purely related to 
key-cache|https://github.com/snazy/cassandra/tree/9738-key-cache-ref]. 
{{9738-key-cache-ohc}} is based on that branch and contains:
* “singletons” for key-cache {{o.a.c.db.SerializationHeader}} instances 
(dynamically extended, if required)
* “singletons” for {{IndexInfo.Serializer}} in {{o.a.c.db.Serializers}} 
(dynamically extended, if required)
* “singletons” for {{BigVersion}} instances for {{ma}}, {{la}}, {{ka}}, {{jb}} 
- other versions get temporary objects (some tests use older sstable versions)

h2. Further optimisations

There are some things that can be optimised in the future:
* Currently we need to serialise keyspace and cf names _and_ cfId. This is 
necessary since cfID of secondary indexes is inherited from the base table. If 
all tables and all secondary indexes have unique IDs, we can omit KS and CF 
name serialisation (and it’s weird {{cfName.contains(‘.’)}} 2i detection). Can 
be built with or after 2i API redesign.
* The full directory path is serialised. This appears to be less expensive than 
iterating of the whole {{List}} of sstables and identifying an sstable by its 
generation.
* As [~benedict] suggested, we can switch to very tiny key-cache entries and 
also omit serialisation of {{IndexInfo}}.


> Migrate key-cache to be fully off-heap
> --------------------------------------
>
>                 Key: CASSANDRA-9738
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9738
>             Project: Cassandra
>          Issue Type: Sub-task
>            Reporter: Robert Stupp
>            Assignee: Robert Stupp
>             Fix For: 3.0.0 rc1
>
>
> Key cache still uses a concurrent map on-heap. This could go to off-heap and 
> feels doable now after CASSANDRA-8099.
> Evaluation should be done in advance based on a POC to prove that pure 
> off-heap counter cache buys a performance and/or gc-pressure improvement.
> In theory, elimination of on-heap management of the map should buy us some 
> benefit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9738) Migrate key-cache to be fully off-heap

Reply via email to