[
https://issues.apache.org/jira/browse/CASSANDRA-19429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17843589#comment-17843589
]
Jon Haddad commented on CASSANDRA-19429:
----------------------------------------
{quote}there is 63 instances in the production codebase where we do not check
tracing level and just log. We should take more holistic approach here and just
fix this everywhere which would be probably better suited for a new ticket.
{quote}
SGTM
{quote}Anyway, I think we should still deliver this with the changes OP
suggested (and myself improved on top of that). This seems like a fairly
innocent change and I do not see where it might go wrong. We should double
check that our understanding of getSize vs getCapacity is correct though.
{quote}
Yeah, I agree with this train of thought, I just don't know if they're
equivalent. My point was more that if we're unsure, simply merging in the
check on the trace would deliver the same exact performance benefit. If you
want to verify the correctness aspect, that SGTM, but since I don't have time
to do it myself I would look for someone else to approve the patch from a
correctness perspective. A +1 on just a trace check is a no brainer and I'd be
fine doing that.
> Remove lock contention generated by getCapacity function in SSTableReader
> -------------------------------------------------------------------------
>
> Key: CASSANDRA-19429
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19429
> Project: Cassandra
> Issue Type: Bug
> Components: Local/SSTable
> Reporter: Dipietro Salvatore
> Assignee: Dipietro Salvatore
> Priority: Normal
> Fix For: 4.0.x, 4.1.x
>
> Attachments: Screenshot 2024-02-26 at 10.27.10.png, Screenshot
> 2024-02-27 at 11.29.41.png, Screenshot 2024-03-19 at 15.22.50.png,
> asprof_cass4.1.3__lock_20240216052912lock.html,
> image-2024-03-08-15-51-30-439.png, image-2024-03-08-15-52-07-902.png
>
> Time Spent: 20m
> Remaining Estimate: 0h
>
> Profiling Cassandra 4.1.3 on large AWS instances, a high number of lock
> acquires is measured in the `getCapacity` function from
> `org/apache/cassandra/cache/InstrumentingCache` (1.9M lock acquires per 60
> seconds). Based on our tests on r8g.24xlarge instances (using Ubuntu 22.04),
> this limits the CPU utilization of the system to under 50% when testing at
> full load and therefore limits the achieved throughput.
> Removing the lock contention from the SSTableReader.java file by replacing
> the call to `getCapacity` with `size` achieves up to 2.95x increase in
> throughput on r8g.24xlarge and 2x on r7i.24xlarge:
> |Instance type|Cass 4.1.3|Cass 4.1.3 patched|
> |r8g.24xlarge|168k ops|496k ops (2.95x)|
> |r7i.24xlarge|153k ops|304k ops (1.98x)|
>
> Instructions to reproduce:
> {code:java}
> ## Requirements for Ubuntu 22.04
> sudo apt install -y ant git openjdk-11-jdk
> ## Build and run
> CASSANDRA_USE_JDK11=true ant realclean && CASSANDRA_USE_JDK11=true ant jar &&
> CASSANDRA_USE_JDK11=true ant stress-build && rm -rf data && bin/cassandra -f
> -R
> # Run
> bin/cqlsh -e 'drop table if exists keyspace1.standard1;' && \
> bin/cqlsh -e 'drop keyspace if exists keyspace1;' && \
> bin/nodetool clearsnapshot --all && tools/bin/cassandra-stress write
> n=10000000 cl=ONE -rate threads=384 -node 127.0.0.1 -log file=cload.log
> -graph file=cload.html && \
> bin/nodetool compact keyspace1 && sleep 30s && \
> tools/bin/cassandra-stress mixed ratio\(write=10,read=90\) duration=10m
> cl=ONE -rate threads=406 -node localhost -log file=result.log -graph
> file=graph.html
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]