Maybe checking which thread(s) would hint what's going on? (see http://www.boxjar.com/using-top-and-jstack-to-find-the-java-thread-that-is-hogging-the-cpu/).
On Wed, Dec 17, 2014 at 1:51 AM, Arne Claassen <a...@emotient.com> wrote: > Cassandra 2.0.10 and Datastax Java Driver 2.1.1 > On Dec 16, 2014, at 4:48 PM, Ryan Svihla <rsvi...@datastax.com> wrote: >> What version of Cassandra? >> >> On Dec 16, 2014 6:36 PM, "Arne Claassen" <a...@emotient.com> wrote: >> That's just the thing. There is nothing in the logs except the constant >> ParNew collections like >> >> DEBUG [ScheduledTasks:1] 2014-12-16 19:03:35,042 GCInspector.java (line 118) >> GC for ParNew: 166 ms for 10 collections, 4400928736 used; max is 8000634888 >> >> But the load is staying continuously high. >> >> There's always some compaction on just that one table, media_tracks_raw >> going on and those values rarely changed (certainly the remaining time is >> meaningless) >> >> pending tasks: 17 >> compaction type keyspace table completed >> total unit progress >> Compaction mediamedia_tracks_raw 444294932 >> 1310653468 bytes 33.90% >> Compaction mediamedia_tracks_raw 131931354 >> 3411631999 bytes 3.87% >> Compaction mediamedia_tracks_raw 30308970 >> 23097672194 bytes 0.13% >> Compaction mediamedia_tracks_raw 899216961 >> 1815591081 bytes 49.53% >> Active compaction remaining time : 0h27m56s >> >> Here's a sample of a query trace: >> >> activity >> | timestamp | source | source_elapsed >> --------------------------------------------------------------------------------------------------+--------------+---------------+---------------- >> >> execute_cql3_query | 00:11:46,612 | 10.140.22.236 | 0 >> Parsing select * from media_tracks_raw where id >> =74fe9449-8ac4-accb-a723-4bad024101e3 limit 100; | 00:11:46,612 | >> 10.140.22.236 | 47 >> >> Preparing statement | 00:11:46,612 | 10.140.22.236 | 234 >> Sending >> message to /10.140.21.54 | 00:11:46,619 | 10.140.22.236 | 7190 >> Message >> received from /10.140.22.236 | 00:11:46,622 | 10.140.21.54 | 12 >> Executing single-partition >> query on media_tracks_raw | 00:11:46,644 | 10.140.21.54 | 21971 >> >> Acquiring sstable references | 00:11:46,644 | 10.140.21.54 | 22029 >> >> Merging memtable tombstones | 00:11:46,644 | 10.140.21.54 | 22131 >> Bloom filter allows >> skipping sstable 1395 | 00:11:46,644 | 10.140.21.54 | 22245 >> Bloom filter allows >> skipping sstable 1394 | 00:11:46,644 | 10.140.21.54 | 22279 >> Bloom filter allows >> skipping sstable 1391 | 00:11:46,644 | 10.140.21.54 | 22293 >> Bloom filter allows >> skipping sstable 1381 | 00:11:46,644 | 10.140.21.54 | 22304 >> Bloom filter allows >> skipping sstable 1376 | 00:11:46,644 | 10.140.21.54 | 22317 >> Bloom filter allows >> skipping sstable 1368 | 00:11:46,644 | 10.140.21.54 | 22328 >> Bloom filter allows >> skipping sstable 1365 | 00:11:46,644 | 10.140.21.54 | 22340 >> Bloom filter allows >> skipping sstable 1351 | 00:11:46,644 | 10.140.21.54 | 22352 >> Bloom filter allows >> skipping sstable 1367 | 00:11:46,644 | 10.140.21.54 | 22363 >> Bloom filter allows >> skipping sstable 1380 | 00:11:46,644 | 10.140.21.54 | 22374 >> Bloom filter allows >> skipping sstable 1343 | 00:11:46,644 | 10.140.21.54 | 22386 >> Bloom filter allows >> skipping sstable 1342 | 00:11:46,644 | 10.140.21.54 | 22397 >> Bloom filter allows >> skipping sstable 1334 | 00:11:46,644 | 10.140.21.54 | 22408 >> Bloom filter allows >> skipping sstable 1377 | 00:11:46,644 | 10.140.21.54 | 22429 >> Bloom filter allows >> skipping sstable 1330 | 00:11:46,644 | 10.140.21.54 | 22441 >> Bloom filter allows >> skipping sstable 1329 | 00:11:46,644 | 10.140.21.54 | 22452 >> Bloom filter allows >> skipping sstable 1328 | 00:11:46,644 | 10.140.21.54 | 22463 >> Bloom filter allows >> skipping sstable 1327 | 00:11:46,644 | 10.140.21.54 | 22475 >> Bloom filter allows >> skipping sstable 1326 | 00:11:46,644 | 10.140.21.54 | 22488 >> Bloom filter allows >> skipping sstable 1320 | 00:11:46,644 | 10.140.21.54 | 22506 >> Bloom filter allows >> skipping sstable 1319 | 00:11:46,644 | 10.140.21.54 | 22518 >> Bloom filter allows >> skipping sstable 1318 | 00:11:46,644 | 10.140.21.54 | 22528 >> Bloom filter allows >> skipping sstable 1317 | 00:11:46,644 | 10.140.21.54 | 22540 >> Bloom filter allows >> skipping sstable 1316 | 00:11:46,644 | 10.140.21.54 | 22552 >> Bloom filter allows >> skipping sstable 1315 | 00:11:46,644 | 10.140.21.54 | 22563 >> Bloom filter allows >> skipping sstable 1314 | 00:11:46,644 | 10.140.21.54 | 22572 >> Bloom filter allows >> skipping sstable 1313 | 00:11:46,644 | 10.140.21.54 | 22583 >> Bloom filter allows >> skipping sstable 1312 | 00:11:46,644 | 10.140.21.54 | 22594 >> Bloom filter allows >> skipping sstable 1311 | 00:11:46,644 | 10.140.21.54 | 22605 >> Bloom filter allows >> skipping sstable 1310 | 00:11:46,644 | 10.140.21.54 | 22616 >> Bloom filter allows >> skipping sstable 1309 | 00:11:46,644 | 10.140.21.54 | 22628 >> Bloom filter allows >> skipping sstable 1308 | 00:11:46,644 | 10.140.21.54 | 22640 >> Bloom filter allows >> skipping sstable 1307 | 00:11:46,644 | 10.140.21.54 | 22651 >> Bloom filter allows >> skipping sstable 1306 | 00:11:46,644 | 10.140.21.54 | 22663 >> Bloom filter allows >> skipping sstable 1305 | 00:11:46,644 | 10.140.21.54 | 22674 >> Bloom filter allows >> skipping sstable 1304 | 00:11:46,644 | 10.140.21.54 | 22684 >> Bloom filter allows >> skipping sstable 1303 | 00:11:46,644 | 10.140.21.54 | 22696 >> Bloom filter allows >> skipping sstable 1302 | 00:11:46,644 | 10.140.21.54 | 22707 >> Bloom filter allows >> skipping sstable 1301 | 00:11:46,644 | 10.140.21.54 | 22718 >> Bloom filter allows >> skipping sstable 1300 | 00:11:46,644 | 10.140.21.54 | 22729 >> Bloom filter allows >> skipping sstable 1299 | 00:11:46,644 | 10.140.21.54 | 22740 >> Bloom filter allows >> skipping sstable 1298 | 00:11:46,644 | 10.140.21.54 | 22752 >> Bloom filter allows >> skipping sstable 1297 | 00:11:46,644 | 10.140.21.54 | 22763 >> Bloom filter allows >> skipping sstable 1296 | 00:11:46,644 | 10.140.21.54 | 22774 >> Key cache >> hit for sstable 1295 | 00:11:46,644 | 10.140.21.54 | 22817 >> Seeking to partition >> beginning in data file | 00:11:46,644 | 10.140.21.54 | 22842 >> Skipped 0/89 non-slice-intersecting sstables, >> included 0 due to tombstones | 00:11:46,646 | 10.140.21.54 | 24109 >> Merging data from >> memtables and 1 sstables | 00:11:46,646 | 10.140.21.54 | 24238 >> Read 101 live >> and 0 tombstoned cells | 00:11:46,663 | 10.140.21.54 | 41389 >> Enqueuing >> response to /10.140.22.236 | 00:11:46,663 | 10.140.21.54 | 41831 >> Sending >> message to /10.140.22.236 | 00:11:46,664 | 10.140.21.54 | 41972 >> Message >> received from /10.140.21.54 | 00:11:46,671 | 10.140.22.236 | 59498 >> Processing >> response from /10.140.21.54 | 00:11:46,672 | 10.140.22.236 | 59563 >> >> Request complete | 00:11:46,704 | 10.140.22.236 | 92781 >> >> Every query I did always just had three mentions of tombstones >> Merging memtable tombstones >> Skipped 0/89 non-slice-intersecting sstables, included 0 due to tombstones >> >> Read 101 live and 0 tombstoned cells >> And unless i misread those, not of them claim that there are any tombstones. >> >> >> On Dec 16, 2014, at 4:26 PM, Ryan Svihla <rsvi...@datastax.com> wrote: >> >>> manual forced compactions create more problems than they solve, if you have >>> no evidence of tombstones in your selects (which seems odd, can you share >>> some of the tracing output?), then I'm not sure what it would solve for you. >>> >>> Compaction running could explain a high load, logs messages with ERRORS, >>> WARN, GCInspector are all meaningful there, I suggest search jira for your >>> version to see if there are any interesting bugs. >>> >>> >>> >>> On Tue, Dec 16, 2014 at 6:14 PM, Arne Claassen <a...@emotient.com> wrote: >>> I just did a wide set of selects and ran across no tombstones. But while on >>> the subject of gc_grace_seconds, any reason, on a small cluster not to set >>> it to something low like a single day. It seems like 10 days is only need >>> to large clusters undergoing long partition splits, or am i >>> misunderstanding gc_grace_seconds. >>> >>> Now, given all that, does any of this explain a high load when the cluster >>> is idle? Is it compaction catching up and would manual forced compaction >>> alleviate that? >>> >>> thanks, >>> arne >>> >>> >>> On Dec 16, 2014, at 3:28 PM, Ryan Svihla <rsvi...@datastax.com> wrote: >>> >>>> so a delete is really another write for gc_grace_seconds (default 10 >>>> days), if you get enough tombstones it can make managing your cluster a >>>> challenge as is. open up cqlsh, turn on tracing and try a few queries..how >>>> many tombstones are scanned for a given query? It's possible the heap >>>> problems you're seeing are actually happening on the query side and not on >>>> the ingest side, the severity of this depends on driver and cassandra >>>> version, but older drivers and versions of cassandra could easily overload >>>> heap with expensive selects, when layered over tombstones it's certainly >>>> becomes a possibility this is your root cause. >>>> >>>> Now this will primarily create more load on compaction and depending on >>>> your cassandra version there maybe some other issue at work, but something >>>> I can tell you is every time I see 1 dropped mutation I see a cluster that >>>> was overloaded enough it had to shed load. If I see 200k I see a >>>> cluster/configuration/hardware that is badly overloaded. >>>> >>>> I suggest the following >>>> trace some of the queries used in prod >>>> monitor your ingest rate, see at what levels you run into issues >>>> (GCInspector log messages, dropped mutations, etc) >>>> heap configuration we mentioned earlier..go ahead and monitor heap usage, >>>> if it hits 75% repeated this is an indication of heavy load >>>> monitor dropped mutations..any dropped mutation is evidence of an >>>> overloaded server, again the root cause can be many other problems that >>>> are solvable with current hardware, and LOTS of people runs with nodes >>>> with similar configuration. >>>> >>>> On Tue, Dec 16, 2014 at 5:08 PM, Arne Claassen <a...@emotient.com> wrote: >>>> Not using any secondary indicies and memtable_flush_queue_size is the >>>> default 4. >>>> >>>> But let me tell you how data is "mutated" right now, maybe that will give >>>> you an insight on how this is happening >>>> >>>> Basically the frame data table has the following primary key: PRIMARY KEY >>>> ((id), trackid, "timestamp") >>>> >>>> Generally data is inserted once. So day to day writes are all new rows. >>>> However, when out process for generating analytics for these rows changes, >>>> we run the media back through again, causing overwrites. >>>> >>>> Up until last night, this was just a new insert because the PK never >>>> changed so it was always 1-to-1 overwrite of every row. >>>> >>>> Last night was the first time that a new change went in where the PK could >>>> actually change so now the process is always, DELETE by partition key, >>>> insert all rows for partition key, repeat. >>>> >>>> We two tables that have similar frame data projections and some other >>>> aggregates with much smaller row count per partition key. >>>> >>>> hope that helps, >>>> arne >>>> >>>> On Dec 16, 2014, at 2:46 PM, Ryan Svihla <rsvi...@datastax.com> wrote: >>>> >>>>> so you've got some blocked flush writers but you have a incredibly large >>>>> number of dropped mutations, are you using secondary indexes? and if so >>>>> how many? what is your flush queue set to? >>>>> >>>>> On Tue, Dec 16, 2014 at 4:43 PM, Arne Claassen <a...@emotient.com> wrote: >>>>> Of course QA decided to start a test batch (still relatively low >>>>> traffic), so I hope it doesn't throw the tpstats off too much >>>>> >>>>> Node 1: >>>>> Pool Name Active Pending Completed Blocked >>>>> All time blocked >>>>> MutationStage 0 0 13804928 0 >>>>> 0 >>>>> ReadStage 0 0 10975 0 >>>>> 0 >>>>> RequestResponseStage 0 0 7725378 0 >>>>> 0 >>>>> ReadRepairStage 0 0 1247 0 >>>>> 0 >>>>> ReplicateOnWriteStage 0 0 0 0 >>>>> 0 >>>>> MiscStage 0 0 0 0 >>>>> 0 >>>>> HintedHandoff 1 1 50 0 >>>>> 0 >>>>> FlushWriter 0 0 306 0 >>>>> 31 >>>>> MemoryMeter 0 0 719 0 >>>>> 0 >>>>> GossipStage 0 0 286505 0 >>>>> 0 >>>>> CacheCleanupExecutor 0 0 0 0 >>>>> 0 >>>>> InternalResponseStage 0 0 0 0 >>>>> 0 >>>>> CompactionExecutor 4 14 159 0 >>>>> 0 >>>>> ValidationExecutor 0 0 0 0 >>>>> 0 >>>>> MigrationStage 0 0 0 0 >>>>> 0 >>>>> commitlog_archiver 0 0 0 0 >>>>> 0 >>>>> AntiEntropyStage 0 0 0 0 >>>>> 0 >>>>> PendingRangeCalculator 0 0 11 0 >>>>> 0 >>>>> MemtablePostFlusher 0 0 1781 0 >>>>> 0 >>>>> >>>>> Message type Dropped >>>>> READ 0 >>>>> RANGE_SLICE 0 >>>>> _TRACE 0 >>>>> MUTATION 391041 >>>>> COUNTER_MUTATION 0 >>>>> BINARY 0 >>>>> REQUEST_RESPONSE 0 >>>>> PAGED_RANGE 0 >>>>> READ_REPAIR 0 >>>>> >>>>> Node 2: >>>>> Pool Name Active Pending Completed Blocked >>>>> All time blocked >>>>> MutationStage 0 0 997042 0 >>>>> 0 >>>>> ReadStage 0 0 2623 0 >>>>> 0 >>>>> RequestResponseStage 0 0 706650 0 >>>>> 0 >>>>> ReadRepairStage 0 0 275 0 >>>>> 0 >>>>> ReplicateOnWriteStage 0 0 0 0 >>>>> 0 >>>>> MiscStage 0 0 0 0 >>>>> 0 >>>>> HintedHandoff 2 2 12 0 >>>>> 0 >>>>> FlushWriter 0 0 37 0 >>>>> 4 >>>>> MemoryMeter 0 0 70 0 >>>>> 0 >>>>> GossipStage 0 0 14927 0 >>>>> 0 >>>>> CacheCleanupExecutor 0 0 0 0 >>>>> 0 >>>>> InternalResponseStage 0 0 0 0 >>>>> 0 >>>>> CompactionExecutor 4 7 94 0 >>>>> 0 >>>>> ValidationExecutor 0 0 0 0 >>>>> 0 >>>>> MigrationStage 0 0 0 0 >>>>> 0 >>>>> commitlog_archiver 0 0 0 0 >>>>> 0 >>>>> AntiEntropyStage 0 0 0 0 >>>>> 0 >>>>> PendingRangeCalculator 0 0 3 0 >>>>> 0 >>>>> MemtablePostFlusher 0 0 114 0 >>>>> 0 >>>>> >>>>> Message type Dropped >>>>> READ 0 >>>>> RANGE_SLICE 0 >>>>> _TRACE 0 >>>>> MUTATION 0 >>>>> COUNTER_MUTATION 0 >>>>> BINARY 0 >>>>> REQUEST_RESPONSE 0 >>>>> PAGED_RANGE 0 >>>>> READ_REPAIR 0 >>>>> >>>>> Node 3: >>>>> Pool Name Active Pending Completed Blocked >>>>> All time blocked >>>>> MutationStage 0 0 1539324 0 >>>>> 0 >>>>> ReadStage 0 0 2571 0 >>>>> 0 >>>>> RequestResponseStage 0 0 373300 0 >>>>> 0 >>>>> ReadRepairStage 0 0 325 0 >>>>> 0 >>>>> ReplicateOnWriteStage 0 0 0 0 >>>>> 0 >>>>> MiscStage 0 0 0 0 >>>>> 0 >>>>> HintedHandoff 1 1 21 0 >>>>> 0 >>>>> FlushWriter 0 0 38 0 >>>>> 5 >>>>> MemoryMeter 0 0 59 0 >>>>> 0 >>>>> GossipStage 0 0 21491 0 >>>>> 0 >>>>> CacheCleanupExecutor 0 0 0 0 >>>>> 0 >>>>> InternalResponseStage 0 0 0 0 >>>>> 0 >>>>> CompactionExecutor 4 9 85 0 >>>>> 0 >>>>> ValidationExecutor 0 0 0 0 >>>>> 0 >>>>> MigrationStage 0 0 0 0 >>>>> 0 >>>>> commitlog_archiver 0 0 0 0 >>>>> 0 >>>>> AntiEntropyStage 0 0 0 0 >>>>> 0 >>>>> PendingRangeCalculator 0 0 6 0 >>>>> 0 >>>>> MemtablePostFlusher 0 0 164 0 >>>>> 0 >>>>> >>>>> Message type Dropped >>>>> READ 0 >>>>> RANGE_SLICE 0 >>>>> _TRACE 0 >>>>> MUTATION 205259 >>>>> COUNTER_MUTATION 0 >>>>> BINARY 0 >>>>> REQUEST_RESPONSE 0 >>>>> PAGED_RANGE 0 >>>>> READ_REPAIR 18 >>>>> >>>>> >>>>> Compaction seems like the only thing consistently active and pending >>>>> >>>>> On Tue, Dec 16, 2014 at 2:18 PM, Ryan Svihla <rsvi...@datastax.com> wrote: >>>>> Ok based on those numbers I have a theory.. >>>>> >>>>> can you show me nodetool tptats for all 3 nodes? >>>>> >>>>> On Tue, Dec 16, 2014 at 4:04 PM, Arne Claassen <a...@emotient.com> wrote: >>>>> No problem with the follow up questions. I'm on a crash course here >>>>> trying to understand what makes C* tick so I appreciate all feedback. >>>>> >>>>> We reprocessed all media (1200 partition keys) last night where partition >>>>> keys had somewhere between 4k and 200k "rows". After that completed, no >>>>> traffic went to cluster at all for ~8 hours and throughout today, we may >>>>> get a couple (less than 10) queries per second and maybe 3-4 write >>>>> batches per hour. >>>>> >>>>> I assume the last value in the Partition Size histogram is the largest >>>>> row: >>>>> >>>>> 20924300 bytes: 79 >>>>> 25109160 bytes: 57 >>>>> >>>>> The majority seems clustered around 200000 bytes. >>>>> >>>>> I will look at switching my inserts to unlogged batches since they are >>>>> always for one partition key. >>>>> >>>>> On Tue, Dec 16, 2014 at 1:47 PM, Ryan Svihla <rsvi...@datastax.com> wrote: >>>>> Can you define what is "virtual no traffic" sorry to be repetitive about >>>>> that, but I've worked on a lot of clusters in the past year and people >>>>> have wildly different ideas what that means. >>>>> >>>>> unlogged batches of the same partition key are definitely a performance >>>>> optimization. Typically async is much faster and easier on the cluster >>>>> when you're using multip partition key batches. >>>>> >>>>> nodetool cfhistograms <keyspace> <tablename> >>>>> >>>>> On Tue, Dec 16, 2014 at 3:42 PM, Arne Claassen <a...@emotient.com> wrote: >>>>> Actually not sure why the machine was originally configured at 6GB since >>>>> we even started it on an r3.large with 15GB. >>>>> >>>>> Re: Batches >>>>> >>>>> Not using batches. I actually have that as a separate question on the >>>>> list. Currently I fan out async single inserts and I'm wondering if >>>>> batches are better since my data is inherently inserted in blocks of >>>>> ordered rows for a single partition key. >>>>> >>>>> >>>>> Re: Traffic >>>>> >>>>> There isn't all that much traffic. Inserts come in as blocks per >>>>> partition key, but then can be 5k-200k rows for that partition key. Each >>>>> of these rows is less than 100k. It's small, lots of ordered rows. It's >>>>> frame and sub-frame information for media. and rows for one piece of >>>>> media is inserted at once (the partition key). >>>>> >>>>> For the last 12 hours, where the load on all these machine has been stuck >>>>> there's been virtually no traffic at all. This is the nodes basically >>>>> sitting idle, except that they had load of 4 each. >>>>> >>>>> BTW, how do you determine widest row or for that matter number of >>>>> tombstones in a row? >>>>> >>>>> thanks, >>>>> arne >>>>> >>>>> On Tue, Dec 16, 2014 at 1:24 PM, Ryan Svihla <rsvi...@datastax.com> wrote: >>>>> So 1024 is still a good 2.5 times what I'm suggesting, 6GB is hardly >>>>> enough to run Cassandra well in, especially if you're going full bore on >>>>> loads. However, you maybe just flat out be CPU bound on your write >>>>> throughput, how many TPS and what size writes do you have? Also what is >>>>> your widest row? >>>>> >>>>> Final question what is compaction throughput at? >>>>> >>>>> >>>>> On Tue, Dec 16, 2014 at 3:20 PM, Arne Claassen <a...@emotient.com> wrote: >>>>> The starting configuration I had, which is still running on two of the >>>>> nodes, was 6GB Heap, 1024MB parnew which is close to what you are >>>>> suggesting and those have been pegged at load 4 for the over 12 hours >>>>> with hardly and read or write traffic. I will set one to 8GB/400MB and >>>>> see if its load changes. >>>>> >>>>> On Tue, Dec 16, 2014 at 1:12 PM, Ryan Svihla <rsvi...@datastax.com> wrote: >>>>> So heap of that size without some tuning will create a number of problems >>>>> (high cpu usage one of them), I suggest either 8GB heap and 400mb parnew >>>>> (which I'd only set that low for that low cpu count) , or attempt the >>>>> tunings as indicated in >>>>> https://issues.apache.org/jira/browse/CASSANDRA-8150 >>>>> >>>>> On Tue, Dec 16, 2014 at 3:06 PM, Arne Claassen <a...@emotient.com> wrote: >>>>> Changed the 15GB node to 25GB heap and the nice CPU is down to ~20% now. >>>>> Checked my dev cluster to see if the ParNew log entries are just par for >>>>> the course, but not seeing them there. However, both have the following >>>>> every 30 seconds: >>>>> >>>>> DEBUG [BatchlogTasks:1] 2014-12-16 21:00:44,898 BatchlogManager.java >>>>> (line 165) Started replayAllFailedBatches >>>>> DEBUG [MemtablePostFlusher:1] 2014-12-16 21:00:44,899 >>>>> ColumnFamilyStore.java (line 866) forceFlush requested but everything is >>>>> clean in batchlog >>>>> DEBUG [BatchlogTasks:1] 2014-12-16 21:00:44,899 BatchlogManager.java >>>>> (line 200) Finished replayAllFailedBatches >>>>> >>>>> Is that just routine scheduled house-keeping or a sign of something else? >>>>> >>>>> On Tue, Dec 16, 2014 at 12:52 PM, Arne Claassen <a...@emotient.com> wrote: >>>>> Sorry, I meant 15GB heap on the one machine that has less nice CPU% now. >>>>> The others are 6GB >>>>> >>>>> On Tue, Dec 16, 2014 at 12:50 PM, Arne Claassen <a...@emotient.com> wrote: >>>>> AWS r3.xlarge, 30GB, but only using a Heap of 10GB, new 2GB because we >>>>> might go c3.2xlarge instead if CPU is more important than RAM >>>>> Storage is optimized EBS SSD (but iostat shows no real IO going on) >>>>> Each node only has about 10GB with ownership of 67%, 64.7% & 68.3%. >>>>> >>>>> The node on which I set the Heap to 10GB from 6GB the utlilization has >>>>> dropped to 46%nice now, but the ParNew log messages still continue at the >>>>> same pace. I'm gonna up the HEAP to 20GB for a bit, see if that brings >>>>> that nice CPU further down. >>>>> >>>>> No TombstoneOverflowingExceptions. >>>>> >>>>> On Tue, Dec 16, 2014 at 11:50 AM, Ryan Svihla <rsvi...@datastax.com> >>>>> wrote: >>>>> What's CPU, RAM, Storage layer, and data density per node? Exact heap >>>>> settings would be nice. In the logs look for TombstoneOverflowingException >>>>> >>>>> >>>>> On Tue, Dec 16, 2014 at 1:36 PM, Arne Claassen <a...@emotient.com> wrote: >>>>> I'm running 2.0.10. >>>>> >>>>> The data is all time series data and as we change our pipeline, we've >>>>> been periodically been reprocessing the data sources, which causes each >>>>> time series to be overwritten, i.e. every row per partition key is >>>>> deleted and re-written, so I assume i've been collecting a bunch of >>>>> tombstones. >>>>> >>>>> Also, the presence of the ever present and never completing compaction >>>>> types, i assumed were an artifact of tombstoning, but i fully admit to >>>>> conjecture based on about ~20 blog posts and stackoverflow questions i've >>>>> surveyed. >>>>> >>>>> I doubled the Heap on one node and it changed nothing regarding the load >>>>> or the ParNew log statements. New Generation Usage is 50%, Eden itself is >>>>> 56%. >>>>> >>>>> Anything else i should look at and report, let me know. >>>>> >>>>> On Tue, Dec 16, 2014 at 11:14 AM, Jonathan Lacefield >>>>> <jlacefi...@datastax.com> wrote: >>>>> Hello, >>>>> >>>>> What version of Cassandra are you running? >>>>> >>>>> If it's 2.0, we recently experienced something similar with 8447 [1], >>>>> which 8485 [2] should hopefully resolve. >>>>> >>>>> Please note that 8447 is not related to tombstones. Tombstone >>>>> processing can put a lot of pressure on the heap as well. Why do you >>>>> think you have a lot of tombstones in that one particular table? >>>>> >>>>> [1] https://issues.apache.org/jira/browse/CASSANDRA-8447 >>>>> [2] https://issues.apache.org/jira/browse/CASSANDRA-8485 >>>>> >>>>> Jonathan >>>>> >>>>> >>>>> Jonathan Lacefield >>>>> Solution Architect | (404) 822 3487 | jlacefi...@datastax.com >>>>> >>>>> >>>>> >>>>> On Tue, Dec 16, 2014 at 2:04 PM, Arne Claassen <a...@emotient.com> wrote: >>>>> I have a three node cluster that has been sitting at a load of 4 (for >>>>> each node), 100% CPI utilization (although 92% nice) for that last 12 >>>>> hours, ever since some significant writes finished. I'm trying to >>>>> determine what tuning I should be doing to get it out of this state. The >>>>> debug log is just an endless series of: >>>>> >>>>> DEBUG [ScheduledTasks:1] 2014-12-16 19:03:35,042 GCInspector.java (line >>>>> 118) GC for ParNew: 166 ms for 10 collections, 4400928736 used; max is >>>>> 8000634880 >>>>> DEBUG [ScheduledTasks:1] 2014-12-16 19:03:36,043 GCInspector.java (line >>>>> 118) GC for ParNew: 165 ms for 10 collections, 4440011176 used; max is >>>>> 8000634880 >>>>> DEBUG [ScheduledTasks:1] 2014-12-16 19:03:37,043 GCInspector.java (line >>>>> 118) GC for ParNew: 135 ms for 8 collections, 4402220568 used; max is >>>>> 8000634880 >>>>> >>>>> iostat shows virtually no I/O. >>>>> >>>>> Compaction may enter into this, but i don't really know what to make of >>>>> compaction stats since they never change: >>>>> >>>>> [root@cassandra-37919c3a ~]# nodetool compactionstats >>>>> pending tasks: 10 >>>>> compaction type keyspace table completed >>>>> total unit progress >>>>> Compaction mediamedia_tracks_raw 271651482 >>>>> 563615497 bytes 48.20% >>>>> Compaction mediamedia_tracks_raw 30308910 >>>>> 21676695677 bytes 0.14% >>>>> Compaction mediamedia_tracks_raw 1198384080 >>>>> 1815603161 bytes 66.00% >>>>> Active compaction remaining time : 0h22m24s >>>>> >>>>> 5 minutes later: >>>>> >>>>> [root@cassandra-37919c3a ~]# nodetool compactionstats >>>>> pending tasks: 9 >>>>> compaction type keyspace table completed >>>>> total unit progress >>>>> Compaction mediamedia_tracks_raw 271651482 >>>>> 563615497 bytes 48.20% >>>>> Compaction mediamedia_tracks_raw 30308910 >>>>> 21676695677 bytes 0.14% >>>>> Compaction mediamedia_tracks_raw 1198384080 >>>>> 1815603161 bytes 66.00% >>>>> Active compaction remaining time : 0h22m24s >>>>> >>>>> Sure the pending tasks went down by one, but the rest is identical. >>>>> media_tracks_raw likely has a bunch of tombstones (can't figure out how >>>>> to get stats on that). >>>>> >>>>> Is this behavior something that indicates that i need more Heap, larger >>>>> new generation? Should I be manually running compaction on tables with >>>>> lots of tombstones? >>>>> >>>>> Any suggestions or places to educate myself better on performance tuning >>>>> would be appreciated. >>>>> >>>>> arne >>>>> >>>>> >>>>> -- >>>>> >>>>> Ryan Svihla >>>>> Solution Architect >>>>> >>>>> >>>>> >>>>> DataStax is the fastest, most scalable distributed database technology, >>>>> delivering Apache Cassandra to the world’s most innovative enterprises. >>>>> Datastax is built to be agile, always-on, and predictably scalable to any >>>>> size. With more than 500 customers in 45 countries, DataStax is the >>>>> database technology and transactional backbone of choice for the worlds >>>>> most innovative companies such as Netflix, Adobe, Intuit, and eBay. >>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> Ryan Svihla >>>>> Solution Architect >>>>> >>>>> >>>>> >>>>> DataStax is the fastest, most scalable distributed database technology, >>>>> delivering Apache Cassandra to the world’s most innovative enterprises. >>>>> Datastax is built to be agile, always-on, and predictably scalable to any >>>>> size. With more than 500 customers in 45 countries, DataStax is the >>>>> database technology and transactional backbone of choice for the worlds >>>>> most innovative companies such as Netflix, Adobe, Intuit, and eBay. >>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> Ryan Svihla >>>>> Solution Architect >>>>> >>>>> >>>>> >>>>> DataStax is the fastest, most scalable distributed database technology, >>>>> delivering Apache Cassandra to the world’s most innovative enterprises. >>>>> Datastax is built to be agile, always-on, and predictably scalable to any >>>>> size. With more than 500 customers in 45 countries, DataStax is the >>>>> database technology and transactional backbone of choice for the worlds >>>>> most innovative companies such as Netflix, Adobe, Intuit, and eBay. >>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> Ryan Svihla >>>>> Solution Architect >>>>> >>>>> >>>>> >>>>> DataStax is the fastest, most scalable distributed database technology, >>>>> delivering Apache Cassandra to the world’s most innovative enterprises. >>>>> Datastax is built to be agile, always-on, and predictably scalable to any >>>>> size. With more than 500 customers in 45 countries, DataStax is the >>>>> database technology and transactional backbone of choice for the worlds >>>>> most innovative companies such as Netflix, Adobe, Intuit, and eBay. >>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> Ryan Svihla >>>>> Solution Architect >>>>> >>>>> >>>>> >>>>> DataStax is the fastest, most scalable distributed database technology, >>>>> delivering Apache Cassandra to the world’s most innovative enterprises. >>>>> Datastax is built to be agile, always-on, and predictably scalable to any >>>>> size. With more than 500 customers in 45 countries, DataStax is the >>>>> database technology and transactional backbone of choice for the worlds >>>>> most innovative companies such as Netflix, Adobe, Intuit, and eBay. >>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> Ryan Svihla >>>>> Solution Architect >>>>> >>>>> >>>>> >>>>> DataStax is the fastest, most scalable distributed database technology, >>>>> delivering Apache Cassandra to the world’s most innovative enterprises. >>>>> Datastax is built to be agile, always-on, and predictably scalable to any >>>>> size. With more than 500 customers in 45 countries, DataStax is the >>>>> database technology and transactional backbone of choice for the worlds >>>>> most innovative companies such as Netflix, Adobe, Intuit, and eBay. >>>>> >>>> >>>> >>>> >>>> -- >>>> >>>> Ryan Svihla >>>> Solution Architect >>>> >>>> >>>> >>>> DataStax is the fastest, most scalable distributed database technology, >>>> delivering Apache Cassandra to the world’s most innovative enterprises. >>>> Datastax is built to be agile, always-on, and predictably scalable to any >>>> size. With more than 500 customers in 45 countries, DataStax is the >>>> database technology and transactional backbone of choice for the worlds >>>> most innovative companies such as Netflix, Adobe, Intuit, and eBay. >>>> >>> >>> >>> >>> -- >>> >>> Ryan Svihla >>> Solution Architect >>> >>> >>> >>> DataStax is the fastest, most scalable distributed database technology, >>> delivering Apache Cassandra to the world’s most innovative enterprises. >>> Datastax is built to be agile, always-on, and predictably scalable to any >>> size. With more than 500 customers in 45 countries, DataStax is the >>> database technology and transactional backbone of choice for the worlds >>> most innovative companies such as Netflix, Adobe, Intuit, and eBay. >>> >>