Re: Nodes just dieing with OOM

2017-10-07 Thread Alain RODRIGUEZ
Hi Brian, Happy to know that problem was (temporary?) solved. We're migrating from i2.xl (32GB ram, Local SSD) to m4.xl (16gb, gp2) so we > have a mix there, Cassandra JVM set to 10GB To prevent these unpredictable mixes of hardware I use to update hardware by adding a new data center,

Re: Nodes just dieing with OOM

2017-10-06 Thread Brian Spindler
Hi Alain, thanks for getting back to me. I will read through those articles. The truncate did solve the problem. I am using Cassandra 2.1.15 I'll look at cfstats in more detail, we've got some charting from JVM metrics yeah. We're migrating from i2.xl (32GB ram, Local SSD) to m4.xl (16gb, gp2)

Re: Nodes just dieing with OOM

2017-10-06 Thread Alain RODRIGUEZ
Hello Brian. Sorry to hear, looks like a lot of troubles. I think we should review this column family design so it doesn't generate > so many tombstones? Could that be the cause? It could be indeed, did truncating solved the issue? There so nicer approaches you can try to handle tombstones

Re: Nodes just dieing with OOM

2017-10-06 Thread Brian Spindler
Sorry about that. We eventually found that one column family had some large/corrupt data and causing OOM's Luckily it was a pretty ephemeral data set and we were able to just truncate it. However, it was a guess based on some log messages about reading a large number of tombstones on that

Nodes just dieing with OOM

2017-10-06 Thread Brian Spindler
Hi guys, our cluster - around 18 nodes - just starting having nodes die and when restarting them they are dying with OOM. How can we handle this? I've tried adding a couple extra gigs on these machines to help but it's not. Help! -B