[
https://issues.apache.org/jira/browse/CASSANDRA-21141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18055222#comment-18055222
]
Dmitry Konstantinov commented on CASSANDRA-21141:
-------------------------------------------------
h2. Test scenario
* Batch write test
* 1 partition text column (size = 15 symbols)
* 1 clustering text column (size = 10 symbols)
* 5 value text columns (size = 10 symbols each)
* inserts are done using 10-row batches
* [^batch_profile_seq.yaml]
* CLI:
{code:java}
./tools/bin/cassandra-stress "user profile=./batch_profile_seq.yaml no-warmup
ops(insert=1,partition-select=0) n=15m" -rate threads=300 -node <IP> -mode
native cql3 maxPending=256 connectionsPerHost=16 {code}
note: usage of uniform distribution for value columns in cassandra-stress
create a bottleneck in cassandra-stress itself in the random values generation
logic, so seq is used there.
h2. Configuration
* 1-node deployment
* jdk-17.0.15+6
* compaction is disabled
* commit log is disabled (due to lack of extra disk on my env to keep the IO
rate)
* trie memtable (64 shards), memtable_allocation_type: offheap_objects
* GC: G1
{code:java}
-XX:ParallelGCThreads=16
-XX:ConcGCThreads=4 {code}
* memtable_flush_writers: 8
*
native_transport_max_request_data_in_flight/native_transport_max_request_data_in_flight_per_ip
increased to 2GiB
* -Dcassandra.set_sep_thread_name=false
* -Dio.netty.eventLoopThreads=2 (by default it is equal to number of cores)
* [^jvm-server.options]
* [^jvm17-server.options]
* [^cassandra.yaml]
h2. Env
* Cassandra node:
** m8i.4xlarge, CPU: Intel Xeon 16 vCPU, RAM: 64GiB
** disk: gp3, 50Gb, IOPS limit: 3000, throughput: 200
* Cassandra stress node:
** c5.9xlarge, CPU: Intel Xeon 36 vCPU, RAM: 72 GiB
h2. Results
||{{Before}}||{{After}}||
|{{Results:}}
*{{Op rate : 215,372 op/s}}*
{{Partition rate : 215,372 pk/s}}
{{Row rate : 2,153,722 row/s}}
{{Latency mean : 1.4 ms }}
{{Latency median : 0.9 ms }}
{{Latency 95th percentile : 2.6 ms }}
{{Latency 99th percentile : 7.9 ms }}
{{Latency 99.9th percentile : 24.0 ms }}
{{Latency max : 275.0 ms }}
{{Total partitions : 15,000,000 }}
{{Total errors : 0 }}
{{Total GC count : 26}}
*{{Total GC memory : 338.469 GiB}}*
{{Total GC time : 6.0 seconds}}
{{Avg GC time : 229.8 ms}}
{{StdDev GC time : 30.0 ms}}
{{Total operation time : 00:01:09}}|{{Results:}}
*{{Op rate : 230,568 op/s}}*
{{Partition rate : 230,568 pk/s}}
{{Row rate : 2,305,677 row/s}}
{{Latency mean : 1.3 ms }}
{{Latency median : 0.9 ms}}
{{Latency 95th percentile : 2.3 ms}}
{{Latency 99th percentile : 6.5 ms }}
{{Latency 99.9th percentile : 23.1 ms }}
{{Latency max : 329.3 ms }}
{{Total partitions : 15,000,000}}
{{Total errors : 0 }}
{{Total GC count : 21}}
*{{Total GC memory : 261.422 GiB}}*
{{Total GC time : 5.5 seconds}}
{{Avg GC time : 264.0 ms}}
{{StdDev GC time : 25.7 ms}}
{{Total operation time : 00:01:05}}|
> Reduce memory allocation during transformation of BatchStatement to Mutation
> ----------------------------------------------------------------------------
>
> Key: CASSANDRA-21141
> URL: https://issues.apache.org/jira/browse/CASSANDRA-21141
> Project: Apache Cassandra
> Issue Type: Improvement
> Components: CQL/Interpreter
> Reporter: Dmitry Konstantinov
> Assignee: Dmitry Konstantinov
> Priority: Normal
> Fix For: 5.x
>
> Attachments: CASSANDRA-21141_alloc.html, CASSANDRA-21141_cpu.html,
> CASSANDRA-21141_wall.html, batch_profile_seq.yaml, cassandra.yaml,
> image-2026-01-28-09-39-38-183.png, jvm-server.options, jvm17-server.options,
> trunk_alloc.html, trunk_cpu.html, trunk_wall.html
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> We allocate a lot of objects during a transformation of BatchStatement to
> Mutation. In many typical scenarios we can have a fast path and reduce the
> amount of allocated objects (as well as make the correspondent logic faster)
> Allocation framegraph:
> !image-2026-01-28-09-39-38-183.png|width=600!
> * [^trunk_alloc.html]
> * [^trunk_cpu.html]
> * [^trunk_wall.html]
> Suggested optimisations:
> * force hash3_x64_128 inlining to help JIT with escape analysis and long[]
> heap allocation elimination, so the hash function value (long[2]) is not
> allocated on heap -
> [link|https://github.com/apache/cassandra/pull/4589/changes/02d5ae650c9581ea061fb1255e2078a278697b6d]
> * serializedRowBodySize: avoid capturing lambda allocation per cell by
> moving capturing arguments to SerializationHelper (same optimization as it
> was done in serializeRowBody for flushing some time ago) -
> [link|https://github.com/apache/cassandra/pull/4589/changes/34a3d7126351630eb91be1ba9546a6e3c84d9359]
> * UpdateParameters: allocate DeletionTime on demand (it is not needed if we
> do insert/updates) -
> [link|https://github.com/apache/cassandra/pull/4589/changes/f8f57ea14f0c40fabb0f049a79146f403c88a009]
> * Add fast path in valuesAsClustering logic for the typical scenario when we
> specify a single clustering key (a single row) to modify -
> [link|https://github.com/apache/cassandra/pull/4589/changes/e11961cf457a4545951cbfa0d20e2b929d5ae453]
> * Add fast path in nonTokenRestrictionValues logic for the typical scenario
> when we specify a single partition key (a single row) to modify, optimize
> also the case if a partition or clustering key is a single column -
> [link|https://github.com/apache/cassandra/pull/4589/changes/b7fe9cc34c0a6c0c3d20b12fc2ccd8a11f98f460]
> * BatchStatement: check if many similar rows for the same table are written
> unconditionally, in this case we can avoid columns info merging and builders
> allocation -
> [link|https://github.com/apache/cassandra/pull/4589/changes/d011dfa68b88fa2d52c9a661d4945c719febf1d5]
> * Avoid ClusteringIndexSliceFilter allocation if a write does not required a
> read (plain usual write), avoid iterator allocation, use array instead of
> ArrayList for perStatementOptions which does not grow dynamically -
> [link|https://github.com/apache/cassandra/pull/4589/changes/0e4abb36105457d7f5e630d6e8b40b560794ae2e]
> Profiles after the changes:
> * [^CASSANDRA-21141_alloc.html]
> * [^CASSANDRA-21141_cpu.html]
> * [^CASSANDRA-21141_wall.html]
> Forecasted reduction for heap allocations in a batch write test ~ 21%:
> {code:java}
> Total GC memory : 347.198 GiB
> vs
> Total GC memory : 272.358 GiB
> {code}
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]