Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-17 Thread Arne Claassen
Ok, tonight we rolled out on the production cluster. This one has 4 nodes and we dropped and recreated the keyspace before re-processing to avoid all possibility of Everything seemed ok, even if the CPU load was pegged and we saw lots of MUTATION dropped message, but after all the reprocessing

100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Arne Claassen
I have a three node cluster that has been sitting at a load of 4 (for each node), 100% CPI utilization (although 92% nice) for that last 12 hours, ever since some significant writes finished. I'm trying to determine what tuning I should be doing to get it out of this state. The debug log is just

Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Jonathan Lacefield
Hello, What version of Cassandra are you running? If it's 2.0, we recently experienced something similar with 8447 [1], which 8485 [2] should hopefully resolve. Please note that 8447 is not related to tombstones. Tombstone processing can put a lot of pressure on the heap as well. Why do

Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Ryan Svihla
What's heap usage at? On Tue, Dec 16, 2014 at 1:04 PM, Arne Claassen a...@emotient.com wrote: I have a three node cluster that has been sitting at a load of 4 (for each node), 100% CPI utilization (although 92% nice) for that last 12 hours, ever since some significant writes finished. I'm

Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Arne Claassen
I'm running 2.0.10. The data is all time series data and as we change our pipeline, we've been periodically been reprocessing the data sources, which causes each time series to be overwritten, i.e. every row per partition key is deleted and re-written, so I assume i've been collecting a bunch of

Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Ryan Svihla
What's CPU, RAM, Storage layer, and data density per node? Exact heap settings would be nice. In the logs look for TombstoneOverflowingException On Tue, Dec 16, 2014 at 1:36 PM, Arne Claassen a...@emotient.com wrote: I'm running 2.0.10. The data is all time series data and as we change our

Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Arne Claassen
AWS r3.xlarge, 30GB, but only using a Heap of 10GB, new 2GB because we might go c3.2xlarge instead if CPU is more important than RAM Storage is optimized EBS SSD (but iostat shows no real IO going on) Each node only has about 10GB with ownership of 67%, 64.7% 68.3%. The node on which I set the

Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Arne Claassen
Sorry, I meant 15GB heap on the one machine that has less nice CPU% now. The others are 6GB On Tue, Dec 16, 2014 at 12:50 PM, Arne Claassen a...@emotient.com wrote: AWS r3.xlarge, 30GB, but only using a Heap of 10GB, new 2GB because we might go c3.2xlarge instead if CPU is more important than

Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Arne Claassen
Changed the 15GB node to 25GB heap and the nice CPU is down to ~20% now. Checked my dev cluster to see if the ParNew log entries are just par for the course, but not seeing them there. However, both have the following every 30 seconds: DEBUG [BatchlogTasks:1] 2014-12-16 21:00:44,898

Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Ryan Svihla
So heap of that size without some tuning will create a number of problems (high cpu usage one of them), I suggest either 8GB heap and 400mb parnew (which I'd only set that low for that low cpu count) , or attempt the tunings as indicated in https://issues.apache.org/jira/browse/CASSANDRA-8150 On

Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Ryan Svihla
also based on replayed batches..are you using batches to load data? On Tue, Dec 16, 2014 at 3:12 PM, Ryan Svihla rsvi...@datastax.com wrote: So heap of that size without some tuning will create a number of problems (high cpu usage one of them), I suggest either 8GB heap and 400mb parnew

Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Arne Claassen
The starting configuration I had, which is still running on two of the nodes, was 6GB Heap, 1024MB parnew which is close to what you are suggesting and those have been pegged at load 4 for the over 12 hours with hardly and read or write traffic. I will set one to 8GB/400MB and see if its load

Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Ryan Svihla
So 1024 is still a good 2.5 times what I'm suggesting, 6GB is hardly enough to run Cassandra well in, especially if you're going full bore on loads. However, you maybe just flat out be CPU bound on your write throughput, how many TPS and what size writes do you have? Also what is your widest row?

Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Arne Claassen
Actually not sure why the machine was originally configured at 6GB since we even started it on an r3.large with 15GB. Re: Batches Not using batches. I actually have that as a separate question on the list. Currently I fan out async single inserts and I'm wondering if batches are better since my

Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Ryan Svihla
Can you define what is virtual no traffic sorry to be repetitive about that, but I've worked on a lot of clusters in the past year and people have wildly different ideas what that means. unlogged batches of the same partition key are definitely a performance optimization. Typically async is much

Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Arne Claassen
No problem with the follow up questions. I'm on a crash course here trying to understand what makes C* tick so I appreciate all feedback. We reprocessed all media (1200 partition keys) last night where partition keys had somewhere between 4k and 200k rows. After that completed, no traffic went to

Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Ryan Svihla
Ok based on those numbers I have a theory.. can you show me nodetool tptats for all 3 nodes? On Tue, Dec 16, 2014 at 4:04 PM, Arne Claassen a...@emotient.com wrote: No problem with the follow up questions. I'm on a crash course here trying to understand what makes C* tick so I appreciate all

Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Arne Claassen
Of course QA decided to start a test batch (still relatively low traffic), so I hope it doesn't throw the tpstats off too much Node 1: Pool NameActive Pending Completed Blocked All time blocked MutationStage 0 0 13804928 0

Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Ryan Svihla
so you've got some blocked flush writers but you have a incredibly large number of dropped mutations, are you using secondary indexes? and if so how many? what is your flush queue set to? On Tue, Dec 16, 2014 at 4:43 PM, Arne Claassen a...@emotient.com wrote: Of course QA decided to start a

Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Arne Claassen
Not using any secondary indicies and memtable_flush_queue_size is the default 4. But let me tell you how data is mutated right now, maybe that will give you an insight on how this is happening Basically the frame data table has the following primary key: PRIMARY KEY ((id), trackid, timestamp)

Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Ryan Svihla
so a delete is really another write for gc_grace_seconds (default 10 days), if you get enough tombstones it can make managing your cluster a challenge as is. open up cqlsh, turn on tracing and try a few queries..how many tombstones are scanned for a given query? It's possible the heap problems

Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Arne Claassen
I just did a wide set of selects and ran across no tombstones. But while on the subject of gc_grace_seconds, any reason, on a small cluster not to set it to something low like a single day. It seems like 10 days is only need to large clusters undergoing long partition splits, or am i

Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Ryan Svihla
manual forced compactions create more problems than they solve, if you have no evidence of tombstones in your selects (which seems odd, can you share some of the tracing output?), then I'm not sure what it would solve for you. Compaction running could explain a high load, logs messages with

Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Arne Claassen
That's just the thing. There is nothing in the logs except the constant ParNew collections like DEBUG [ScheduledTasks:1] 2014-12-16 19:03:35,042 GCInspector.java (line 118) GC for ParNew: 166 ms for 10 collections, 4400928736 used; max is 8000634888 But the load is staying continuously high.

Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Ryan Svihla
What version of Cassandra? On Dec 16, 2014 6:36 PM, Arne Claassen a...@emotient.com wrote: That's just the thing. There is nothing in the logs except the constant ParNew collections like DEBUG [ScheduledTasks:1] 2014-12-16 19:03:35,042 GCInspector.java (line 118) GC for ParNew: 166 ms for 10

Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Arne Claassen
Cassandra 2.0.10 and Datastax Java Driver 2.1.1 On Dec 16, 2014, at 4:48 PM, Ryan Svihla rsvi...@datastax.com wrote: What version of Cassandra? On Dec 16, 2014 6:36 PM, Arne Claassen a...@emotient.com wrote: That's just the thing. There is nothing in the logs except the constant ParNew

Re: 100% CPU utilization, ParNew and never completing compactions

2014-12-16 Thread Jens Rantil
Maybe checking which thread(s) would hint what's going on? (see http://www.boxjar.com/using-top-and-jstack-to-find-the-java-thread-that-is-hogging-the-cpu/). On Wed, Dec 17, 2014 at 1:51 AM, Arne Claassen a...@emotient.com wrote: Cassandra 2.0.10 and Datastax Java Driver 2.1.1 On Dec 16,