[jira] [Comment Edited] (CASSANDRA-21083) Optimize memtable flush logic

Dmitry Konstantinov (Jira) Thu, 18 Dec 2025 15:08:09 -0800


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-21083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18046432#comment-18046432
 ]


Dmitry Konstantinov edited comment on CASSANDRA-21083 at 12/18/25 11:07 PM:
----------------------------------------------------------------------------

An initial draft: 
[https://github.com/apache/cassandra/compare/trunk...netudima:cassandra:CASSANDRA-21083-trunk?expand=1]

The current results (same test as here: 
https://issues.apache.org/jira/browse/CASSANDRA-20226): 
{code:java}
Results:
Op rate                   :  193,357 op/s  [insert: 193,357 op/s]
Partition rate            :  193,357 pk/s  [insert: 193,357 pk/s]
Row rate                  : 1,933,570 row/s [insert: 1,933,570 row/s]
Latency mean              :    1.5 ms [insert: 1.5 ms]
Latency median            :    1.2 ms [insert: 1.2 ms]
Latency 95th percentile   :    2.3 ms [insert: 2.3 ms]
Latency 99th percentile   :    5.8 ms [insert: 5.8 ms]
Latency 99.9th percentile :   24.5 ms [insert: 24.5 ms]
Latency max               :  517.2 ms [insert: 517.2 ms]
Total partitions          : 15,000,000 [insert: 15,000,000]
Total errors              :          0 [insert: 0]
Total GC count            : 36
Total GC memory           : 350.825 GiB
Total GC time             :    7.0 seconds
Avg GC time               :  193.7 ms
StdDev GC time            :   98.6 ms
Total operation time      : 00:01:17{code}
[^CASSANDRA-21083.html]

 

Stats about flushing per single memtable (allocated - allocated in heap in the 
flushing thread, cpu time - CPU time for the flushing thread, time spent - 
clock time)
Before:
{code:java}
INFO  [PerDiskMemtableFlushWriter_0:5] 2025-12-18T22:34:05,712 
Flushing.java:196 - Completed flushing 
/u02/dmko_cassandra/data/stress/stress_table-1b255f4def2540a60000000000000005/pa-43-big-Data.db
 (226.710MiB) for commitlog position CommitLogPosition(segmentId=1766097136053, 
position=169847), time spent: 6290 ms, bytes flushed: 237723076/(37.785MiB per 
sec), partitions flushed: 258854/(43142 per sec), rows: 2588540/(431423 per 
sec), cpu time: 4023 ms, allocated: 953.590MiB {code}
After:
{code:java}
INFO  [PerDiskMemtableFlushWriter_0:11] 2025-12-18T20:08:42,743 
Flushing.java:196 - Completed flushing 
/u02/dmko_cassandra/data/stress/stress_table-1b255f4def2540a60000000000000005/pa-37-big-Data.db
 (226.686MiB) for commitlog position CommitLogPosition(segmentId=1766088439199, 
position=165252), time spent: 3767 ms, bytes flushed: 237697547/(79232515 per 
sec), partitions flushed: 258813/(86271 per sec), rows: 2588130/(862710 per 
sec), cpu time: 2581 ms, allocated: 204.196MiB
{code}
|| ||Before||After||
|Heap allocated, MiB|953.590|204.196|
|flushed rows/sec|431423|862710|
|CPU time, ms|4023|2581|
|clock time, ms |6290|3767|


was (Author: dnk):
An initial draft: 
[https://github.com/apache/cassandra/compare/trunk...netudima:cassandra:CASSANDRA-21083-trunk?expand=1]

The current results (same test as here: 
https://issues.apache.org/jira/browse/CASSANDRA-20226): 
{code:java}
Results:
Op rate                   :  193,357 op/s  [insert: 193,357 op/s]
Partition rate            :  193,357 pk/s  [insert: 193,357 pk/s]
Row rate                  : 1,933,570 row/s [insert: 1,933,570 row/s]
Latency mean              :    1.5 ms [insert: 1.5 ms]
Latency median            :    1.2 ms [insert: 1.2 ms]
Latency 95th percentile   :    2.3 ms [insert: 2.3 ms]
Latency 99th percentile   :    5.8 ms [insert: 5.8 ms]
Latency 99.9th percentile :   24.5 ms [insert: 24.5 ms]
Latency max               :  517.2 ms [insert: 517.2 ms]
Total partitions          : 15,000,000 [insert: 15,000,000]
Total errors              :          0 [insert: 0]
Total GC count            : 36
Total GC memory           : 350.825 GiB
Total GC time             :    7.0 seconds
Avg GC time               :  193.7 ms
StdDev GC time            :   98.6 ms
Total operation time      : 00:01:17{code}
 

 

Stats about flushing per single memtable (allocated - allocated in heap in the 
flushing thread, cpu time - CPU time for the flushing thread, time spent - 
clock time)
Before:
{code:java}
INFO  [PerDiskMemtableFlushWriter_0:5] 2025-12-18T22:34:05,712 
Flushing.java:196 - Completed flushing 
/u02/dmko_cassandra/data/stress/stress_table-1b255f4def2540a60000000000000005/pa-43-big-Data.db
 (226.710MiB) for commitlog position CommitLogPosition(segmentId=1766097136053, 
position=169847), time spent: 6290 ms, bytes flushed: 237723076/(37.785MiB per 
sec), partitions flushed: 258854/(43142 per sec), rows: 2588540/(431423 per 
sec), cpu time: 4023 ms, allocated: 953.590MiB {code}
After:
{code:java}
INFO  [PerDiskMemtableFlushWriter_0:11] 2025-12-18T20:08:42,743 
Flushing.java:196 - Completed flushing 
/u02/dmko_cassandra/data/stress/stress_table-1b255f4def2540a60000000000000005/pa-37-big-Data.db
 (226.686MiB) for commitlog position CommitLogPosition(segmentId=1766088439199, 
position=165252), time spent: 3767 ms, bytes flushed: 237697547/(79232515 per 
sec), partitions flushed: 258813/(86271 per sec), rows: 2588130/(862710 per 
sec), cpu time: 2581 ms, allocated: 204.196MiB
{code}
|| ||Before||After||
|Heap allocated, MiB|953.590|204.196|
|flushed rows/sec|431423|862710|
|CPU time, ms|4023|2581|
|clock time, ms |6290|3767|

> Optimize memtable flush logic
> -----------------------------
>
>                 Key: CASSANDRA-21083
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-21083
>             Project: Apache Cassandra
>          Issue Type: Improvement
>          Components: Local/Memtable
>            Reporter: Dmitry Konstantinov
>            Assignee: Dmitry Konstantinov
>            Priority: Normal
>             Fix For: 5.x
>
>         Attachments: CASSANDRA-21083.html
>
>
> Memtable flushing to disk impacts write performance and can be a limiting 
> factor for write throughput:
>  * If we cannot flush fast enough we have to limit writes to memtables due to 
> lack of available memory for them
>  * flushing logic can be CPU-intensive and complete with writing threads for 
> CPU by stealing 1-2 cores (or even more if memtable_flush_writers is set to a 
> higher value)
> Suggested optimisations:
>  # invoke MetadataCollector.updateClusteringValues only for first and last 
> clustering key in a partition, not for every row 
> ([link|https://github.com/apache/cassandra/commit/df2df1d0eefc8b603eafa87f42ed1975dfc46143])
>  # split call sites for in Cell.Serializer serialize logic to avoid 
> megamorphic calls + make cell.isCounterCell check cheaper (avoid megamorphic 
> call + pre-calculate isCounterColumn info) 
> ([link|https://github.com/apache/cassandra/commit/7653194932e9ffb966c0c1c1f76fbcf532f222a7])
>  # check if Guardrails enabled at the beginning of writing, not per row, 
> avoid hidden auto-boxing for logging of primitive parameters 
> ([link|https://github.com/apache/cassandra/commit/f17e835108ad6f282257e95992084a33e9d47b52])
>  # add fast return for BTreeRow.hasComplexDeletion if there was no deletions, 
> avoid column.name.bytes.hashCode if not needed, avoid capturing lambda 
> allocation in UnfilteredSerializer.serializeRowBody 
> ([link|https://github.com/apache/cassandra/commit/8c2d0f5a24a6e25832d7fae6668f01fbbccc285a])
>  # reduce allocations during serialization of NativeClustering 
> ([link|https://github.com/apache/cassandra/commit/457f2803efd2af1a919dbe56fe958627e5652fc2])
>  # do not re-map colums in serializeRowBody if they haven't changed 
> ([link|https://github.com/apache/cassandra/commit/c0f08ee437f5d468f83ff6a1c952182bc5b156a4])
>  # add flushing iterator without column filtering 
> ([link|https://github.com/apache/cassandra/commit/0f77206334320815dc80622122b40dc2f5c3f6fd])
>  # split call sites in MetadataCollector.update(Cell<?> cell) to improve cell 
> methods inlining and use monomorphic calls 
> ([link|https://github.com/apache/cassandra/commit/7fa2bf3ba4e11a2c4e7be421aba1295fb6738f18])



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (CASSANDRA-21083) Optimize memtable flush logic

Reply via email to