[
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14280945#comment-14280945
]
Ariel Weisberg commented on CASSANDRA-7438:
-------------------------------------------
I ran the benchmark on the develop branch today using a c3.8xlarge and profiled
with flight recorder. There is definitely some contention on the lock in JNA. I
also see a little in AbstractQueuedSynchronizer from locking the segments.
along with some park/unpark activity.
I built jemalloc (-march=native --disable-fill --disable-stats). The Ubuntu
package compiles at o2 instead of o3. I am getting full utilization across 30
threads if I increase the number of segments to 256 otherwise it hovers around
2600% (with 30 threads). It cuts in half the number of instances of contention
in the profiler.
The workload settings you ran with resulted in a lot of cache (ohcache, not CPU
cache) misses. I think a real workload where the cache is useful will have more
hits.
One note about the benchmark, building the histogram of buckets is not a
lightweight operation. I think that should be off by default. I removed it for
my testing. Otherwise it looks ok. Using the Timer as shared state in a
micro-benchmarks is probably not the way to go. I would have a timer per driver
thread and then aggregate.
I am running 1-30 threads and it will take a few hours to finish. I am going to
look into benchmarking inside C* and comparing the existing cache
implementation to OHC now.
I used this which gave me mostly cache hits and filled up quite a bit of RAM.
It takes a minute or two to fill the cache.
{noformat}
#!/bin/sh
LD_PRELOAD=~/jemalloc-3.6.0/lib/libjemalloc.so.1 \
java -Xmx8g -XX:+UnlockCommercialFeatures -XX:+FlightRecorder \
-DDISABLE_JEMALLOC=true \
-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=7091
-Dcom.sun.management.jmxremote.local.only=false \
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false \
-Djava.rmi.server.hostname=ec2-54-172-234-230.compute-1.amazonaws.com \
-jar ohc-benchmark/target/ohc-benchmark-0.3-SNAPSHOT.jar \
-rkd 'gaussian(1..15000000,2)' -wkd 'gaussian(1..15000000,2)' -vs
'gaussian(1024..4096,2)' -r .9 -cap 32000000000 \
-d 120 -t 30 \
-sc 256
{noformat}
256 segments, jemalloc LD_PRELOAD, -DDISABLE_JEMALLOC=true
{noformat}
Reads : one/five/fifteen/mean: 2503894/2143858/2036336/2459949
count: 295258886
min/max/mean/stddev: 0.00047/ 0.76172/ 0.00652/ 0.03865
75/95/98/99/999/median: 0.00439/ 0.00697/ 0.01147/ 0.03458/
0.75864/ 0.00342
Writes : one/five/fifteen/mean: 278134/238242/226326/273275
count: 32800525
min/max/mean/stddev: 0.00176/ 0.89665/ 0.00945/ 0.03986
75/95/98/99/999/median: 0.00719/ 0.01180/ 0.01816/ 0.11640/
0.89006/ 0.00556
{noformat}
256 segments, jemalloc via jna
{noformat}
Reads : one/five/fifteen/mean: 2343872/1458688/1159829/2387622
count: 286635526
min/max/mean/stddev: 0.00054/ 0.97114/ 0.00756/ 0.04664
75/95/98/99/999/median: 0.00435/ 0.00675/ 0.00985/ 0.05139/
0.95959/ 0.00341
Writes : one/five/fifteen/mean: 260376/162076/128883/265250
count: 31843705
min/max/mean/stddev: 0.00267/ 0.70586/ 0.01502/ 0.05161
75/95/98/99/999/median: 0.01049/ 0.01695/ 0.04193/ 0.36639/
0.70331/ 0.00859
{noformat}
default segments, jemalloc LD_PRELOAD, -DDISABLE_JEMALLOC=true
{noformat}
Reads : one/five/fifteen/mean: 2148677/1630379/1448226/2202878
count: 264549288
min/max/mean/stddev: 0.00035/ 0.66081/ 0.00820/ 0.03519
75/95/98/99/999/median: 0.00435/ 0.01247/ 0.05423/ 0.20834/
0.65286/ 0.00323
Writes : one/five/fifteen/mean: 238699/180945/160641/244767
count: 29395103
min/max/mean/stddev: 0.00172/ 0.39821/ 0.01120/ 0.03079
75/95/98/99/999/median: 0.00805/ 0.02124/ 0.08665/ 0.18473/
0.39776/ 0.00574
{noformat}
> Serializing Row cache alternative (Fully off heap)
> --------------------------------------------------
>
> Key: CASSANDRA-7438
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7438
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Environment: Linux
> Reporter: Vijay
> Assignee: Robert Stupp
> Labels: performance
> Fix For: 3.0
>
> Attachments: 0001-CASSANDRA-7438.patch, tests.zip
>
>
> Currently SerializingCache is partially off heap, keys are still stored in
> JVM heap as BB,
> * There is a higher GC costs for a reasonably big cache.
> * Some users have used the row cache efficiently in production for better
> results, but this requires careful tunning.
> * Overhead in Memory for the cache entries are relatively high.
> So the proposal for this ticket is to move the LRU cache logic completely off
> heap and use JNI to interact with cache. We might want to ensure that the new
> implementation match the existing API's (ICache), and the implementation
> needs to have safe memory access, low overhead in memory and less memcpy's
> (As much as possible).
> We might also want to make this cache configurable.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)