[ https://issues.apache.org/jira/browse/CASSANDRA-13900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16180759#comment-16180759 ]
Jeremiah Jordan commented on CASSANDRA-13900: --------------------------------------------- It is supported, 3.0 and 3.11 support the same versions to upgrade from. But yes it is a much bigger upgrade step. Was just wondering if you had tried 3.11 (possibly on a test cluster) to see if you saw the same issues. > Massive GC suspension increase after updating to 3.0.14 from 2.1.18 > ------------------------------------------------------------------- > > Key: CASSANDRA-13900 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13900 > Project: Cassandra > Issue Type: Bug > Reporter: Thomas Steinmaurer > Priority: Blocker > Attachments: cassandra2118_vs_3014.jpg, cassandra3014_jfr_5min.jpg > > > In short: After upgrading to 3.0.14 (from 2.1.18), we aren't able to process > the same incoming write load on the same infrastructure anymore. > We have a loadtest environment running 24x7 testing our software using > Cassandra as backend. Both, loadtest and production is hosted in AWS and do > have the same spec on the Cassandra-side, namely: > * 9x m4.xlarge > * 8G heap > * CMS (400MB newgen) > * 2TB EBS gp2 > * Client requests are entirely CQL > per node. We have a solid/constant baseline in loadtest at ~ 60% CPU cluster > AVG with constant, simulated load running against our cluster, using > Cassandra 2.1 for > 2 years now. > Recently we started to upgrade to 3.0.14 in this 9 node loadtest environment, > and basically, 3.0.14 isn't able to cope with the load anymore. No particular > special tweaks, memory settings/changes etc., all the same as in 2.1.18. We > also didn't upgrade sstables yet, thus the increase mentioned in the > screenshot is not related to any manually triggered maintenance operation > after upgrading to 3.0.14. > According to our monitoring, with 3.0.14, we see a *GC suspension time > increase by a factor of > 2*, of course directly correlating with an CPU > increase > 80%. See: attached screen "cassandra2118_vs_3014.jpg" > This all means that our incoming load against 2.1.18 is something, 3.0.14 > can't handle. So, we would need to either scale up (e.g. m4.xlarge => > m4.2xlarge) or scale out for being able to handle the same load, which is > cost-wise not an option. > Unfortunately I do not have Java Flight Recorder runs for 2.1.18 at the > mentioned load, but can provide JFR session for our current 3.0.14 setup. The > attached 5min JFR memory allocation area (cassandra3014_jfr_5min.jpg) shows > compaction being the top contributor for the captured 5min time-frame. Could > be by "accident" covering the 5min with compaction as top contributor only > (although mentioned simulated client load is attached), but according to > stack traces, we see new classes from 3.0, e.g. BTreeRow.searchIterator() > etc. popping up as top contributor, thus possibly new classes / data > structures are causing much more object churn now. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org