[
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dipietro Salvatore updated CASSANDRA-19429:
-------------------------------------------
Description:
Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock
acquires is measured in the `getCapacity` function from
`org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60
seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04),
this limits the CPU utilization of the system to under 50% when testing at full
load and therefore limits the achieved throughput.
Removing the lock contention from the SSTableReader.java file by replacing the
call to `getCapacity` with `size` achieves up to 2.95x increase in throughput
on r8g.24xlarge and 2x on r7i.24xlarge:
|Instance type|Cass 4.1.3|Cass 4.1.3 patched|
|r8g.24xlarge|168k ops|496k ops (2.95x)|
|r7i.24xlarge|153k ops|304k ops (1.98x)|
Instructions to reproduce:
```bash
#
## Requirements for Ubuntu 22.04
sudo apt install -y ant git openjdk-11-jdk
#
## Build and run
CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar &&
CASSANDRA_USE_JDK11=true ant stress-build && rm -rf data && bin/cassandra -f -R
# Run
bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \
bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \
bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write n=10000000
cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log -graph
file=cload.html && \
bin/nodetool compact keyspace1 && sleep 30s && \
tools/bin/cassandra-stress mixed ratio(write=10,read=90) duration=10m cl=ONE
-rate threads=406 -node localhost -log file=result.log -graph file=graph.html
```
was:
Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock
acquires is measured in the `getCapacity` function from
`org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60
seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04),
this limits the CPU utilization of the system to under 50% when testing at full
load and therefore limits the achieved throughput.
Removing the lock contention from the SSTableReader.java file by replacing the
call to `getCapacity` with `size` achieves up to 2.95x increase in throughput
on r8g.24xlarge and 2x on r7i.24xlarge:
| Instance type | Cass 4.1.3 | Cass 4.1.3 patched |
| ------------- | ---------- | ------------------ |
| r8g.24xlarge | 168k ops | 496k ops (2.95x) |
| r7i.24xlarge | 153k ops | 304k ops (1.98x) |
Instructions to reproduce:
```bash
## Requirements for Ubuntu 22.04
sudo apt install -y ant git openjdk-11-jdk
## Build and run
CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar &&
CASSANDRA_USE_JDK11=true ant stress-build && rm -rf data && bin/cassandra -f -R
# Run
bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \
bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \
bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write n=10000000
cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log -graph
file=cload.html && \
bin/nodetool compact keyspace1 && sleep 30s && \
tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m cl=ONE
-rate threads=406 -node localhost -log file=result.log -graph file=graph.html
```
> Remove lock contention generated by getCapacity function in SSTableReader
> -------------------------------------------------------------------------
>
> Key: CASSANDRA-19429
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19429
> Project: Cassandra
> Issue Type: Bug
> Reporter: Dipietro Salvatore
> Priority: Normal
> Attachments: asprof_cass4.1.3__lock_20240216052912lock.html
>
>
> Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock
> acquires is measured in the `getCapacity` function from
> `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60
> seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04),
> this limits the CPU utilization of the system to under 50% when testing at
> full load and therefore limits the achieved throughput.
> Removing the lock contention from the SSTableReader.java file by replacing
> the call to `getCapacity` with `size` achieves up to 2.95x increase in
> throughput on r8g.24xlarge and 2x on r7i.24xlarge:
> |Instance type|Cass 4.1.3|Cass 4.1.3 patched|
> |r8g.24xlarge|168k ops|496k ops (2.95x)|
> |r7i.24xlarge|153k ops|304k ops (1.98x)|
> Instructions to reproduce:
> ```bash
> #
> ## Requirements for Ubuntu 22.04
> sudo apt install -y ant git openjdk-11-jdk
> #
> ## Build and run
> CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar &&
> CASSANDRA_USE_JDK11=true ant stress-build && rm -rf data && bin/cassandra -f
> -R
> # Run
> bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \
> bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \
> bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write
> n=10000000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log
> -graph file=cload.html && \
> bin/nodetool compact keyspace1 && sleep 30s && \
> tools/bin/cassandra-stress mixed ratio(write=10,read=90) duration=10m cl=ONE
> -rate threads=406 -node localhost -log file=result.log -graph file=graph.html
> ```
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]