Thomas Steinmaurer created CASSANDRA-13900:
----------------------------------------------
Summary: Massive GC suspension increase after updating to 3.0.14
from 2.1.18
Key: CASSANDRA-13900
URL: https://issues.apache.org/jira/browse/CASSANDRA-13900
Project: Cassandra
Issue Type: Bug
Reporter: Thomas Steinmaurer
Priority: Blocker
Attachments: cassandra2.1.18_vs_3.0.14.png
In short: After upgrading to 3.0.14 (2.1.18), we aren't able to process the
same incoming write load on the same infrastructure anymore.
We have a loadtest environment running 24x7 testing our software using
Cassandra as backend. Both, loadtest and production is hosted in AWS and do
have the same spec on the Cassandra-side, namely:
* 9x m4.xlarge
* 8G heap
* CMS (400MB newgen)
* 2TB EBS gp2
per node. We have a solid/constant baseline in loadtest at ~ 60% CPU cluster
AVG with constant, simulated load running against our cluster, using Cassandra
2.1 for > 2 years now.
Recently we started to upgrade to 3.0.14 in this 9 node loadtest environment,
and basically, 3.0.14 isn't able to cope with the load anymore. No particular
special tweaks, memory settings/changes etc., all the same as in 2.1.8. We also
didn't upgrade sstables yet, thus the increase mentioned below is not related
to any manually triggered maintenance operation after upgrading to 3.0.14.
According to our monitoring, with 3.0.14, we see a GC suspension time increase
by a factor of > 2, of course directly correlating with an CPU increase > 80%.
!cassandra2.1.18_vs_3.0.14.png|thumbnail!
This all means that our incoming load for several weeks now against 2.1.18 is
something, 3.0.14 can't handle. So, we would need to either scale up (e.g. to
m4.2xlarge) or scale out for being able to handle the same load.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]