[
https://issues.apache.org/jira/browse/CASSANDRA-19949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17885065#comment-17885065
]
Romain Anselin edited comment on CASSANDRA-19949 at 9/26/24 3:35 PM:
---------------------------------------------------------------------
Result on 4.0.0 with 3 nodes and RF3 (after bumping range_request_timeout_in_ms
to 20000 with exec profile of 25s - otherwise it timed out:
{code:java}
$ python3 objcount.py -i 10.x.y.z -k romain -t count_perf | tee cass400count.txt
# COUNT #
1f474e30-7c1c-11ef-93d9-b11043d37775
Row count:100000
Count timing with fetch 5000: 0:00:14.870138
Average row size: 10000.0{code}
+ Attached cass400trace.txt based on
{code:java}
cqlsh -e "show session 1f474e30-7c1c-11ef-93d9-b11043d37775" | tee
cass400trace.txt{code}
was (Author: romain.anselin):
Result on 4.0.0 with 3 nodes and RF3 (after bumping range_request_timeout_in_ms
to 20 with exec profile of 25s - otherwise timeout:
{code:java}
# COUNT #
1f474e30-7c1c-11ef-93d9-b11043d37775
Row count:100000
Count timing with fetch 5000: 0:00:14.870138
Average row size: 10000.0{code}
+ Attached cass400trace.txt based on
{code:java}
cqlsh -e "show session 1f474e30-7c1c-11ef-93d9-b11043d37775" | tee
cass400trace.txt{code}
> Count performance regression in Cassandra 4.x
> ---------------------------------------------
>
> Key: CASSANDRA-19949
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19949
> Project: Cassandra
> Issue Type: Bug
> Reporter: Romain Anselin
> Priority: Normal
> Fix For: 4.0.x, 4.1.x
>
> Attachments: cass311count.txt, cass311debugcount.txt,
> cass311trace.txt, cass400trace.txt, cass41count.txt, cass41debugcount.txt,
> cass41trace.txt, objcount-1.py
>
>
> Cassandra 4 exhibit a severe drop of performance on count operations.
> We created a reproduction workflow inserting a 100k rows of 10kb random string
> After this data is inserted in a 3 nodes cluster at RF3 and queried at LQ, a
> count on said table takes
> - circa 2s on 3.11
> - consistently more than 10s on 4.0 and 4.1 (around 12 to 13s) - tested
> 4.0.10 and 4.1.5
> Observation of same program/query against each environment:
> 3.11
> {code:java}
> # COUNT #
> 61a5bcb0-75ca-11ef-9cff-55d571fe1347
> Row count:100000
> Count timing with fetch 5000: 0:00:01.846531
> Average row size: 10000.0{code}
> 4.1
> {code:java}
> # COUNT #
> 55d79f60-75cb-11ef-a8be-399c3e257132
> Row count:100000
> Count timing with fetch 5000: 0:00:13.408626
> Average row size: 10000.0{code}
> The UUID shown in the above output is the trace ID on execution of the query
> which is then exported from each cluster via the command below and provide
> the cassXXtrace.txt file
> {noformat}
> cqlsh -e show session [trace_id] | tee cassXXtrace.txt{noformat}
> Attached cass311trace.txt and cass41trace.txt which show the associated
> events from above query.
> Note the issue is way more prevalent in a 3 nodes cluster (I also have tested
> on docker in one node and it's less visible).
> Attaching objcount.py which contains 2 functions to insert and read the data.
> The insert is pretty slow due to generating random junk 10k objects but
> allows to reproduce. Just comment out the gateway_insert function for it to
> trigger data insert.
> {code:java}
> # gateway_insert(session, ks, tbl)
> gateway_query(session, ks, tbl, fetch){code}
> Requires argparse and cassandra driver
> To use, run the following command. Consider uncommenting l.40 and 41 for
> ks/table creation and l. 155 for insert workload
> {code:java}
> python3 ./objcount.py -i <ip> -k <ks> -t <table>{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]