[ 
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14280945#comment-14280945
 ] 

Ariel Weisberg commented on CASSANDRA-7438:
-------------------------------------------

I ran the benchmark on the develop branch today using a c3.8xlarge and profiled 
with flight recorder. There is definitely some contention on the lock in JNA. I 
also see a little in AbstractQueuedSynchronizer from locking the segments. 
along with some park/unpark activity.

I built jemalloc (-march=native --disable-fill --disable-stats). The Ubuntu 
package compiles at o2 instead of o3. I am getting full utilization across 30 
threads if I increase the number of segments to 256 otherwise it hovers around 
2600% (with 30 threads). It cuts in half the number of instances of contention 
in the profiler.

The workload settings you ran with resulted in a lot of cache (ohcache, not CPU 
cache) misses. I think a real workload where the cache is useful will have more 
hits.

One note about the benchmark, building the histogram of buckets is not a 
lightweight operation. I think that should be off by default. I removed it for 
my testing. Otherwise it looks ok. Using the Timer as shared state in a 
micro-benchmarks is probably not the way to go. I would have a timer per driver 
thread and then aggregate.

I am running 1-30 threads and it will take a few hours to finish. I am going to 
look into benchmarking inside C* and comparing the existing cache 
implementation to OHC now.

I used this which gave me mostly cache hits and filled up quite a bit of RAM. 
It takes a minute or two to fill the cache.
{noformat}
#!/bin/sh
LD_PRELOAD=~/jemalloc-3.6.0/lib/libjemalloc.so.1 \
java -Xmx8g -XX:+UnlockCommercialFeatures -XX:+FlightRecorder \
-DDISABLE_JEMALLOC=true \
-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=7091 
-Dcom.sun.management.jmxremote.local.only=false \
-Dcom.sun.management.jmxremote.authenticate=false 
-Dcom.sun.management.jmxremote.ssl=false \
-Djava.rmi.server.hostname=ec2-54-172-234-230.compute-1.amazonaws.com \
-jar ohc-benchmark/target/ohc-benchmark-0.3-SNAPSHOT.jar  \
-rkd 'gaussian(1..15000000,2)' -wkd 'gaussian(1..15000000,2)' -vs 
'gaussian(1024..4096,2)' -r .9 -cap 32000000000 \
-d 120 -t 30 \
-sc 256
{noformat}

256 segments, jemalloc LD_PRELOAD, -DDISABLE_JEMALLOC=true
{noformat}
     Reads     : one/five/fifteen/mean:  2503894/2143858/2036336/2459949
                 count:                   295258886 
                 min/max/mean/stddev:     0.00047/ 0.76172/ 0.00652/ 0.03865
                 75/95/98/99/999/median:  0.00439/ 0.00697/ 0.01147/ 0.03458/ 
0.75864/ 0.00342
     Writes    : one/five/fifteen/mean:  278134/238242/226326/273275
                 count:                    32800525 
                 min/max/mean/stddev:     0.00176/ 0.89665/ 0.00945/ 0.03986
                 75/95/98/99/999/median:  0.00719/ 0.01180/ 0.01816/ 0.11640/ 
0.89006/ 0.00556
{noformat}

256 segments, jemalloc via jna
{noformat}
     Reads     : one/five/fifteen/mean:  2343872/1458688/1159829/2387622
                 count:                   286635526 
                 min/max/mean/stddev:     0.00054/ 0.97114/ 0.00756/ 0.04664
                 75/95/98/99/999/median:  0.00435/ 0.00675/ 0.00985/ 0.05139/ 
0.95959/ 0.00341
     Writes    : one/five/fifteen/mean:  260376/162076/128883/265250
                 count:                    31843705 
                 min/max/mean/stddev:     0.00267/ 0.70586/ 0.01502/ 0.05161
                 75/95/98/99/999/median:  0.01049/ 0.01695/ 0.04193/ 0.36639/ 
0.70331/ 0.00859
{noformat}

default segments, jemalloc LD_PRELOAD, -DDISABLE_JEMALLOC=true
{noformat}
     Reads     : one/five/fifteen/mean:  2148677/1630379/1448226/2202878
                 count:                   264549288 
                 min/max/mean/stddev:     0.00035/ 0.66081/ 0.00820/ 0.03519
                 75/95/98/99/999/median:  0.00435/ 0.01247/ 0.05423/ 0.20834/ 
0.65286/ 0.00323
     Writes    : one/five/fifteen/mean:  238699/180945/160641/244767
                 count:                    29395103 
                 min/max/mean/stddev:     0.00172/ 0.39821/ 0.01120/ 0.03079
                 75/95/98/99/999/median:  0.00805/ 0.02124/ 0.08665/ 0.18473/ 
0.39776/ 0.00574
{noformat}

> Serializing Row cache alternative (Fully off heap)
> --------------------------------------------------
>
>                 Key: CASSANDRA-7438
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: Linux
>            Reporter: Vijay
>            Assignee: Robert Stupp
>              Labels: performance
>             Fix For: 3.0
>
>         Attachments: 0001-CASSANDRA-7438.patch, tests.zip
>
>
> Currently SerializingCache is partially off heap, keys are still stored in 
> JVM heap as BB, 
> * There is a higher GC costs for a reasonably big cache.
> * Some users have used the row cache efficiently in production for better 
> results, but this requires careful tunning.
> * Overhead in Memory for the cache entries are relatively high.
> So the proposal for this ticket is to move the LRU cache logic completely off 
> heap and use JNI to interact with cache. We might want to ensure that the new 
> implementation match the existing API's (ICache), and the implementation 
> needs to have safe memory access, low overhead in memory and less memcpy's 
> (As much as possible).
> We might also want to make this cache configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to