[ 
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17825465#comment-17825465
 ] 

Jon Haddad commented on CASSANDRA-19429:
----------------------------------------

I'm using the latest 4.1 and the branch off 4.1 I found above.

Cassandra 4.1 ships configured to use ParNew + CMS, but there's not really a 
good reason to use that.  We changed to G1 in 5.0, it's far superior to ParNew 
/ CMS for general usage.  Both versions were set up using G1.

I built my clusters using easy-cass-lab (formerly tlp-cluster).  Here's the 
cassandra.patch.yaml I used along with the JMX options and a bunch of other 
settings.  You can see the workloads I ran and my flame graphs in a previous 
comment.

I ran these tests on the same node, with the same exact dataset.  In fact, I 
ran many tests, switching back and forth between 4.1 and the patched branch in 
order to try to find some way of replicating what you found, however I was 
unable to.  

 
{noformat}
---
cluster_name: "Test Cluster"
num_tokens: 4
seed_provider:
  class_name: "org.apache.cassandra.locator.SimpleSeedProvider"
  parameters:
    seeds: "172.31.42.182"
hints_directory: "/mnt/cassandra/hints"
data_file_directories:
- "/mnt/cassandra/data"
commitlog_directory: "/mnt/cassandra/commitlog"
concurrent_reads: 128
concurrent_writes: 64
trickle_fsync: true
endpoint_snitch: "Ec2Snitch"
memtable_heap_space: 16MiB{noformat}
 

 
{noformat}
### G1 Settings
## Use the Hotspot garbage-first collector.
-XX:+UseG1GC
-XX:+ParallelRefProcEnabled
-XX:MaxTenuringThreshold=4
-XX:G1HeapRegionSize=16m
-XX:+UnlockExperimentalVMOptions
-XX:G1NewSizePercent=50
-XX:G1MaxNewSizePercent=70-Xmx24G
-Xms24G{noformat}
 

 

 
{noformat}
sudo sysctl kernel.perf_event_paranoid=1
sudo sysctl kernel.kptr_restrict=0
echo 0 > /proc/sys/vm/zone_reclaim_mode{noformat}
 

 

 
{noformat}
# /etc/security/limits.d/cassandra.conf
cassandra soft memlock unlimited
cassandra hard memlock unlimited
cassandra soft nofile 100000
cassandra hard nofile 100000
cassandra soft nproc 32768
cassandra hard nproc 32768
cassandra - as unlimited{noformat}
 
{noformat}
# /etc/sysctl.d/60-cassandra.conf
vm.max_map_count = 1048575{noformat}
{noformat}
sudo swapoff --allsudo sysctl -p{noformat}
 

 

 

> Remove lock contention generated by getCapacity function in SSTableReader
> -------------------------------------------------------------------------
>
>                 Key: CASSANDRA-19429
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-19429
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Local/SSTable
>            Reporter: Dipietro Salvatore
>            Assignee: Dipietro Salvatore
>            Priority: Normal
>             Fix For: 4.0.x, 4.1.x
>
>         Attachments: Screenshot 2024-02-26 at 10.27.10.png, Screenshot 
> 2024-02-27 at 11.29.41.png, asprof_cass4.1.3__lock_20240216052912lock.html, 
> image-2024-03-08-15-51-30-439.png, image-2024-03-08-15-52-07-902.png
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock 
> acquires is measured in the `getCapacity` function from 
> `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60 
> seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04), 
> this limits the CPU utilization of the system to under 50% when testing at 
> full load and therefore limits the achieved throughput.
> Removing the lock contention from the SSTableReader.java file by replacing 
> the call to `getCapacity` with `size` achieves up to 2.95x increase in 
> throughput on r8g.24xlarge and 2x on r7i.24xlarge:
> |Instance type|Cass 4.1.3|Cass 4.1.3 patched|
> |r8g.24xlarge|168k ops|496k ops (2.95x)|
> |r7i.24xlarge|153k ops|304k ops (1.98x)|
>  
> Instructions to reproduce:
> {code:java}
> ## Requirements for Ubuntu 22.04
> sudo apt install -y ant git openjdk-11-jdk
> ## Build and run
> CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar && 
> CASSANDRA_USE_JDK11=true ant stress-build  && rm -rf data && bin/cassandra -f 
> -R
> # Run
> bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \
> bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \
> bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write 
> n=10000000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log 
> -graph file=cload.html && \
> bin/nodetool compact keyspace1   && sleep 30s && \
> tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m 
> cl=ONE -rate threads=406 -node localhost -log file=result.log -graph 
> file=graph.html
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to