[ https://issues.apache.org/jira/browse/CASSANDRA-14840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661748#comment-16661748 ]
Jeff Jirsa commented on CASSANDRA-14840: ---------------------------------------- This is a duplicate of CASSANDRA-11748 and/or CASSANDRA-13569 - what's happening is that when the new instance comes online, it pulls schema from all of the other instances in the cluster at once, getting 80+ copies of what's probably a very large schema all at once. If you really have no data in any of those tables, the easiest solution may be to start removing them to decrease schema size and make the thundering herd of schema mutations less painful (this may be a viable option if the tables are old and unused - if you expect to use them again, keep reading). Beyond that, you have two options: 1) Try to make it so you can better handle all of the incoming mutations - this may mean a bigger heap, tuning the memtable, or similar. Hard to give concrete suggestions without a heap dump and knowing your current settings. Offheap memtable may be a starting point given you're on 2.1. 2) Try to limit the number of concurrent migrations - this is going to sound awful, for obvious reasons, but one of the things that may work is to artificially restrict your instance's view of the ring using firewall rules so it can only communicate with a handful of hosts (maybe just the seeds) for the first 5-15 seconds after it starts, then once it's got the schema, remove the rules allowing it to talk to the rest of the cluster so it can properly bootstrap. One of the other two JIRAs will eventually get addressed; I'm going to dupe this to CASSANDRA-11748 since it's a lower number (earlier reporting). > Bootstrap of new node fails with OOM in a large cluster > ------------------------------------------------------- > > Key: CASSANDRA-14840 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14840 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging > Reporter: Jai Bheemsen Rao Dhanwada > Priority: Critical > > We are seeing new node addition fails with OOM during bootstrap in a cluster > of more than 80 nodes and 3000 CF without any data in those CFs. > > Steps to reproduce: > # Launch a 3 node cluster > # Create 3000 CF in the cluster > # Start adding nodes to the cluster one by one > # After adding 75-80 nodes, the new node bootstrap fails with OOM. > {code:java} > ERROR [PERIODIC-COMMIT-LOG-SYNCER] 2018-10-24 03:26:47,870 > JVMStabilityInspector.java:78 - Exiting due to error while processing commit > log during initialization. > java.lang.OutOfMemoryError: Java heap space > at java.util.regex.Pattern.matcher(Pattern.java:1093) ~[na:1.8.0_151] > at java.util.Formatter.parse(Formatter.java:2547) ~[na:1.8.0_151] > at java.util.Formatter.format(Formatter.java:2501) ~[na:1.8.0_151] > at java.util.Formatter.format(Formatter.java:2455) ~[na:1.8.0_151] > at java.lang.String.format(String.java:2940) ~[na:1.8.0_151] > at > org.apache.cassandra.db.commitlog.AbstractCommitLogService$1.run(AbstractCommitLogService.java:105) > ~[apache-cassandra-2.1.16.jar:2.1.16] > at java.lang.Thread.run(Thread.java:748) [na:1.8.0_151]{code} > Cassandra Version: 2.1.16 > OS: CentOS7 > num_tokens: 256 on each node. > > This behavior is blocking us from adding extra capacity when needed. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org