[ https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17821200#comment-17821200 ]
Stefan Miklosovic commented on CASSANDRA-19429: ----------------------------------------------- Well, I do see some speedup, just not 2x / 3x, I think this change is amplified more powerful machine a node runs on without this patch {noformat} Op rate : 89,392 op/s [READ: 80,439 op/s, WRITE: 8,953 op/s] Partition rate : 89,392 pk/s [READ: 80,439 pk/s, WRITE: 8,953 pk/s] Row rate : 89,392 row/s [READ: 80,439 row/s, WRITE: 8,953 row/s] Latency mean : 2.2 ms [READ: 2.2 ms, WRITE: 2.2 ms] Latency median : 1.6 ms [READ: 1.6 ms, WRITE: 1.6 ms] Latency 95th percentile : 5.8 ms [READ: 5.8 ms, WRITE: 5.9 ms] Latency 99th percentile : 10.6 ms [READ: 10.6 ms, WRITE: 10.7 ms] Latency 99.9th percentile : 23.5 ms [READ: 23.4 ms, WRITE: 23.8 ms] Latency max : 180.5 ms [READ: 180.5 ms, WRITE: 122.7 ms] Total partitions : 5,408,128 [READ: 4,866,473, WRITE: 541,655] Total errors : 0 [READ: 0, WRITE: 0] Total GC count : 0 Total GC memory : 0.000 KiB Total GC time : 0.0 seconds Avg GC time : NaN ms StdDev GC time : 0.0 ms Total operation time : 00:01:00 {noformat} with this patch, two independent runs: {noformat} Op rate : 119,782 op/s [READ: 107,849 op/s, WRITE: 11,933 op/s] Partition rate : 119,782 pk/s [READ: 107,849 pk/s, WRITE: 11,933 pk/s] Row rate : 119,782 row/s [READ: 107,849 row/s, WRITE: 11,933 row/s] Latency mean : 1.7 ms [READ: 1.6 ms, WRITE: 1.7 ms] Latency median : 1.3 ms [READ: 1.3 ms, WRITE: 1.4 ms] Latency 95th percentile : 3.8 ms [READ: 3.8 ms, WRITE: 4.0 ms] Latency 99th percentile : 7.7 ms [READ: 7.7 ms, WRITE: 8.0 ms] Latency 99.9th percentile : 13.7 ms [READ: 13.7 ms, WRITE: 14.1 ms] Latency max : 114.6 ms [READ: 61.5 ms, WRITE: 114.6 ms] Total partitions : 7,188,152 [READ: 6,472,051, WRITE: 716,101] Total errors : 0 [READ: 0, WRITE: 0] Total GC count : 0 Total GC memory : 0.000 KiB Total GC time : 0.0 seconds Avg GC time : NaN ms StdDev GC time : 0.0 ms Total operation time : 00:01:00 {noformat} {noformat} Results: Op rate : 104,456 op/s [READ: 94,016 op/s, WRITE: 10,440 op/s] Partition rate : 104,456 pk/s [READ: 94,016 pk/s, WRITE: 10,440 pk/s] Row rate : 104,456 row/s [READ: 94,016 row/s, WRITE: 10,440 row/s] Latency mean : 1.9 ms [READ: 1.9 ms, WRITE: 2.0 ms] Latency median : 1.5 ms [READ: 1.4 ms, WRITE: 1.5 ms] Latency 95th percentile : 4.7 ms [READ: 4.6 ms, WRITE: 4.8 ms] Latency 99th percentile : 8.6 ms [READ: 8.6 ms, WRITE: 8.8 ms] Latency 99.9th percentile : 13.9 ms [READ: 13.8 ms, WRITE: 14.1 ms] Latency max : 85.4 ms [READ: 77.2 ms, WRITE: 85.4 ms] Total partitions : 6,268,822 [READ: 5,642,258, WRITE: 626,564] Total errors : 0 [READ: 0, WRITE: 0] Total GC count : 0 Total GC memory : 0.000 KiB Total GC time : 0.0 seconds Avg GC time : NaN ms StdDev GC time : 0.0 ms Total operation time : 00:01:00 {noformat} so the speedup it like 20% which is quite nice already. > Remove lock contention generated by getCapacity function in SSTableReader > ------------------------------------------------------------------------- > > Key: CASSANDRA-19429 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19429 > Project: Cassandra > Issue Type: Bug > Components: Local/SSTable > Reporter: Dipietro Salvatore > Assignee: Dipietro Salvatore > Priority: Normal > Fix For: 4.0.x, 4.1.x > > Attachments: Screenshot 2024-02-26 at 10.27.10.png, > asprof_cass4.1.3__lock_20240216052912lock.html > > Time Spent: 20m > Remaining Estimate: 0h > > Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock > acquires is measured in the `getCapacity` function from > `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 > seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), > this limits the CPU utilization of the system to under 50% when testing at > full load and therefore limits the achieved throughput. > Removing the lock contention from the SSTableReader.java file by replacing > the call to `getCapacity` with `size` achieves up to 2.95x increase in > throughput on r8g.24xlarge and 2x on r7i.24xlarge: > |Instance type|Cass 4.1.3|Cass 4.1.3 patched| > |r8g.24xlarge|168k ops|496k ops (2.95x)| > |r7i.24xlarge|153k ops|304k ops (1.98x)| > > Instructions to reproduce: > {code:java} > ## Requirements for Ubuntu 22.04 > sudo apt install -y ant git openjdk-11-jdk > ## Build and run > CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && > CASSANDRA_USE_JDK11=true ant stress-build && rm -rf data && bin/cassandra -f > -R > # Run > bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \ > bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \ > bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write > n=10000000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log > -graph file=cload.html && \ > bin/nodetool compact keyspace1 && sleep 30s && \ > tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m > cl=ONE -rate threads=406 -node localhost -log file=result.log -graph > file=graph.html > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org