[jira] [Commented] (CASSANDRA-15430) Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations compared to 2.1.18

Thomas Steinmaurer (Jira) Wed, 15 Jan 2020 23:33:35 -0800


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-15430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17016628#comment-17016628
 ]


Thomas Steinmaurer commented on CASSANDRA-15430:
------------------------------------------------

[~benedict], please try to download the JFR files for both 2.1.18 and 3.0.18 
here: 
[https://dynatrace-my.sharepoint.com/:f:/p/thomas_steinmaurer/EoFkdBH-WnlOmuGZ4hL_8PwByBTQLwhtlBGBLW_0y3P9rg?e=uKlr6W]

The data model is pretty straightforward originating from Astyanax/Thrift 
legacy days, moving over to CQL, in a BLOB-centric model, with our client-side 
"serializer framework".

E.g.:
{noformat}
CREATE TABLE ks."cf" (
    k blob,
    n blob,
    v blob,
    PRIMARY KEY (k, n)
) WITH COMPACT STORAGE
    AND CLUSTERING ORDER BY (n ASC)
    AND bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND comment = ''
    AND compaction = {'class': 
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
'max_threshold': '32', 'min_threshold': '2'}
    AND compression = {'chunk_length_in_kb': '64', 'class': 
'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND dclocal_read_repair_chance = 0.0
    AND default_time_to_live = 0
    AND gc_grace_seconds = 259200
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = 'NONE';
{noformat}

Regarding queries. It is really just about the write path (batch message 
processing) in Cas 2.1 vs. 3.0 as outlined in the issue description. We have 
tried single-partition batches vs. multi-partition batches (I know, bad 
practice), but single-partition batches didn't have a positive impact on the 
write path in 3.0 either in our tests.

Moving from 2.1 to 3.0 would mean for us to add ~ 30-40% more resources to 
handle the same load sufficiently. Thanks for any help in that area!

> Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations 
> compared to 2.1.18
> ----------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-15430
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15430
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Thomas Steinmaurer
>            Priority: Normal
>         Attachments: dashboard.png, jfr_allocations.png, mutation_stage.png
>
>
> In a 6 node loadtest cluster, we have been running with 2.1.18 a certain 
> production-like workload constantly and sufficiently. After upgrading one 
> node to 3.0.18 (remaining 5 still on 2.1.18 after we have seen that sort of 
> regression described below), 3.0.18 is showing increased CPU usage, increase 
> GC, high mutation stage pending tasks, dropped mutation messages ...
> Some spec. All 6 nodes equally sized:
>  * Bare metal, 32 physical cores, 512G RAM
>  * Xmx31G, G1, max pause millis = 2000ms
>  * cassandra.yaml basically unchanged, thus same settings in regard to number 
> of threads, compaction throttling etc.
> Following dashboard shows highlighted areas (CPU, suspension) with metrics 
> for all 6 nodes and the one outlier being the node upgraded to Cassandra 
> 3.0.18.
>  !dashboard.png|width=1280!
> Additionally we see a large increase on pending tasks in the mutation stage 
> after the upgrade:
>  !mutation_stage.png!
> And dropped mutation messages, also confirmed in the Cassandra log:
> {noformat}
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:24,780 MessagingService.java:1022 - 
> MUTATION messages were dropped in last 5000 ms: 41552 for internal timeout 
> and 0 for cross node timeout
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,157 StatusLogger.java:52 - Pool 
> Name                    Active   Pending      Completed   Blocked  All Time 
> Blocked
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> MutationStage                   256     81824     3360532756         0        
>          0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> ViewMutationStage                 0         0              0         0        
>          0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> ReadStage                         0         0       62862266         0        
>          0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> RequestResponseStage              0         0     2176659856         0        
>          0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> ReadRepairStage                   0         0              0         0        
>          0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> CounterMutationStage              0         0              0         0        
>          0
> ...
> {noformat}
> Judging from a 15min JFR session for both, 3.0.18 and 2.1.18 on a different 
> node, high-level, it looks like the code path underneath 
> {{BatchMessage.execute}} is producing ~ 10x more on-heap allocations in 
> 3.0.18 compared to 2.1.18.
>  !jfr_allocations.png!
> Left => 3.0.18
>  Right => 2.1.18
> JFRs zipped are exceeding the 60MB limit to directly attach to the ticket. I 
> can upload them, if there is another destination available.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-15430) Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations compared to 2.1.18

Reply via email to