[ 
https://issues.apache.org/jira/browse/CASSANDRA-21141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18056558#comment-18056558
 ] 

Chris Lohfink commented on CASSANDRA-21141:
-------------------------------------------

Ran the JMH:

  h4. Before
  {noformat}
  Benchmark                                     (batchSize)  (uniquePartition)  
 Mode  Cnt         Score     Error   Units
  BatchStatementBench.bench                           10000               true  
thrpt    5         0.124 ±   0.003  ops/ms
  BatchStatementBench.bench:gc.alloc.rate             10000               true  
thrpt    5      2721.298 ±  64.464  MB/sec
  BatchStatementBench.bench:gc.alloc.rate.norm        10000               true  
thrpt    5  23078519.405 ± 121.630    B/op
  BatchStatementBench.bench:gc.count                  10000               true  
thrpt    5      1529.000            counts
  BatchStatementBench.bench:gc.time                   10000               true  
thrpt    5      1573.000                ms
  BatchStatementBench.bench                           10000              false  
thrpt    5         0.273 ±   0.009  ops/ms
  BatchStatementBench.bench:gc.alloc.rate             10000              false  
thrpt    5      2765.582 ±  96.105  MB/sec
  BatchStatementBench.bench:gc.alloc.rate.norm        10000              false  
thrpt    5  10635657.965 ± 353.444    B/op
  BatchStatementBench.bench:gc.count                  10000              false  
thrpt    5      1442.000            counts
  BatchStatementBench.bench:gc.time                   10000              false  
thrpt    5       685.000                ms
  {noformat}

  h4. After
  {noformat}
  Benchmark                                     (batchSize)  (uniquePartition)  
 Mode  Cnt         Score     Error   Units
  BatchStatementBench.bench                           10000               true  
thrpt    5         0.144 ±   0.028  ops/ms
  BatchStatementBench.bench:gc.alloc.rate             10000               true  
thrpt    5      2692.812 ± 520.233  MB/sec
  BatchStatementBench.bench:gc.alloc.rate.norm        10000               true  
thrpt    5  19558500.522 ± 107.450    B/op
  BatchStatementBench.bench:gc.count                  10000               true  
thrpt    5      1445.000            counts
  BatchStatementBench.bench:gc.time                   10000               true  
thrpt    5      1464.000                ms
  BatchStatementBench.bench                           10000              false  
thrpt    5         0.315 ±   0.011  ops/ms
  BatchStatementBench.bench:gc.alloc.rate             10000              false  
thrpt    5      2162.397 ±  76.545  MB/sec
  BatchStatementBench.bench:gc.alloc.rate.norm        10000              false  
thrpt    5   7195656.967 ± 341.927    B/op
  BatchStatementBench.bench:gc.count                  10000              false  
thrpt    5      1128.000            counts
  BatchStatementBench.bench:gc.time                   10000              false  
thrpt    5       531.000                ms
  {noformat}


  The {{uniquePartition=false}} case (same partition batch inserts) shows the 
largest improvement:
  * *32% reduction* in allocations per operation (~3.4 MB saved per 10k row 
batch)
  * *22% reduction* in GC time
  * *15% improvement* in throughput


> Reduce memory allocation during transformation of BatchStatement to Mutation
> ----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-21141
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-21141
>             Project: Apache Cassandra
>          Issue Type: Improvement
>          Components: CQL/Interpreter
>            Reporter: Dmitry Konstantinov
>            Assignee: Dmitry Konstantinov
>            Priority: Normal
>             Fix For: 5.x
>
>         Attachments: CASSANDRA-21141_alloc.html, CASSANDRA-21141_cpu.html, 
> CASSANDRA-21141_wall.html, batch_profile_seq.yaml, cassandra.yaml, 
> image-2026-01-28-09-39-38-183.png, jvm-server.options, jvm17-server.options, 
> trunk_alloc.html, trunk_cpu.html, trunk_wall.html
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> We allocate a lot of objects during a transformation of BatchStatement to 
> Mutation. In many typical scenarios we can have a fast path and reduce the 
> amount of allocated objects (as well as make the correspondent logic faster)
> Allocation framegraph:
> !image-2026-01-28-09-39-38-183.png|width=600!
>  * [^trunk_alloc.html]
>  * [^trunk_cpu.html]
>  * [^trunk_wall.html]
> Suggested optimisations:
>  * force hash3_x64_128 inlining to help JIT with escape analysis and long[] 
> heap allocation elimination, so the hash function value (long[2]) is not 
> allocated on heap - 
> [link|https://github.com/apache/cassandra/pull/4589/changes/02d5ae650c9581ea061fb1255e2078a278697b6d]
>  * serializedRowBodySize: avoid capturing lambda allocation per cell by 
> moving capturing arguments to SerializationHelper (same optimization as it 
> was done in serializeRowBody for flushing some time ago) - 
> [link|https://github.com/apache/cassandra/pull/4589/changes/34a3d7126351630eb91be1ba9546a6e3c84d9359]
>  * UpdateParameters: allocate DeletionTime on demand (it is not needed if we 
> do insert/updates) - 
> [link|https://github.com/apache/cassandra/pull/4589/changes/f8f57ea14f0c40fabb0f049a79146f403c88a009]
>  * Add fast path in valuesAsClustering logic for the typical scenario when we 
> specify a single clustering key (a single row) to modify -
> [link|https://github.com/apache/cassandra/pull/4589/changes/e11961cf457a4545951cbfa0d20e2b929d5ae453]
>  * Add fast path in nonTokenRestrictionValues logic  for the typical scenario 
> when we specify a single partition key (a single row) to modify, optimize 
> also the case if a partition or clustering key is a single column - 
> [link|https://github.com/apache/cassandra/pull/4589/changes/b7fe9cc34c0a6c0c3d20b12fc2ccd8a11f98f460]
>  * BatchStatement: check if many similar rows for the same table are written 
> unconditionally, in this case we can avoid columns info merging and builders 
> allocation - 
> [link|https://github.com/apache/cassandra/pull/4589/changes/d011dfa68b88fa2d52c9a661d4945c719febf1d5]
>  * Avoid ClusteringIndexSliceFilter allocation if a write does not required a 
> read (plain usual write), avoid iterator allocation, use array instead of 
> ArrayList for perStatementOptions which does not grow dynamically - 
> [link|https://github.com/apache/cassandra/pull/4589/changes/0e4abb36105457d7f5e630d6e8b40b560794ae2e]
> Profiles after the changes:
>  * [^CASSANDRA-21141_alloc.html]
>  * [^CASSANDRA-21141_cpu.html]
>  * [^CASSANDRA-21141_wall.html]
> Reduction for heap allocations in a batch write test ~ 20% (see the comment 
> with test results for more details)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to