Re: Advice on memory warning

2013-04-25 Thread aaron morton
There have been a lot of discussions about GC tuning on the mail thread. Here's 
a really quick set of guidelines I use, please search the mail archive if it 
does not answer your question. 

If heavy GC activity correlates with cassandra compaction, do one or more of:
* reduce concurrent_compactions to 2 or 3
* reduce compaction_throughput
* reduce in_memory_compction_throughput

These are heavy handed changes designed to get things under control, you 
probably want to remove some of the changes later. 

Enable GC logging in cassandra-env.sh and look at how much memory is in use 
after a full/CMS compaction. If this is more than 50% of the heap you may end 
up doing a lot of GC. If you have hundreds of millions of rows per node, on pre 
1.2, reduce the bloom_fp_chance on the CF's and index_sampling yaml config to 
reduce JVM memory use. 

If you have wide rows consider using (on 4 to 8 cores)
NEW_HEAP: 1000M
SurviviorRatio 4
MaxTenuringThreshold 4

Look at the tenuring distribution in the GC log to see how many ParNew passes 
objects make it through. If you often see more objects  with tenuring 1 or 2 
consider running with MaxTenuringThreshold 2. This can help reduce the amount 
of premature tenuring. 

GC problems are a combination of workload and configuration, and sometimes take 
a while to sort out. 

Hope that helps 
 
-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 24/04/2013, at 11:53 PM, Michael Theroux mthero...@yahoo.com wrote:

 Hello,
 
 Just to wrap up on my part of this thread, tuning CMS compaction threshold 
 (-XX:CMSInitiatingOccupancyFraction) to 70 appears to resolved my issues with 
 the memory warnings.  However, I don't believe this would be a solution to 
 all the issues mentioned below.  Although, it does make sense to me tune this 
 value below the flush_largest_memtables_at value in cassandra.yaml so CMS 
 compaction will kick in before we start flushing memtables to free memory.
 
 Thanks!
 -Mike
 
 On Apr 23, 2013, at 12:47 PM, Haithem Jarraya wrote:
 
 We are facing similar issue, and we are not able to have the ring stable.  
 We are using C*1.2.3 on Centos6, 32GB - RAM, 8GB-heap, 6 Nodes.
 The total data ~ 84gb (which is relatively small for C* to handle, with a RF 
 of 3).  Our application is heavy read, we see the GC complaints in all 
 nodes, I copied and past the output below.
 Also we usually see much larger values for the Pending - ReadStage, not sure 
 what is the best advice for this.
 
 Thanks,
 
 Haithem
  
 INFO [ScheduledTasks:1] 2013-04-23 16:40:02,118 GCInspector.java (line 119) 
 GC for ConcurrentMarkSweep: 911 ms for 1 collections, 5945542968 used; max 
 is 8199471104
  INFO [ScheduledTasks:1] 2013-04-23 16:40:16,051 GCInspector.java (line 119) 
 GC for ConcurrentMarkSweep: 322 ms for 1 collections, 5639896576 used; max 
 is 8199471104
  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,829 GCInspector.java (line 119) 
 GC for ConcurrentMarkSweep: 2273 ms for 1 collections, 6762618136 used; max 
 is 8199471104
  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,830 StatusLogger.java (line 53) 
 Pool NameActive   Pending   Blocked
  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,830 StatusLogger.java (line 68) 
 ReadStage 4 4 0
  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,831 StatusLogger.java (line 68) 
 RequestResponseStage  1 6 0
  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,831 StatusLogger.java (line 68) 
 ReadRepairStage   0 0 0
  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,831 StatusLogger.java (line 68) 
 MutationStage 0 0 0
  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,831 StatusLogger.java (line 68) 
 ReplicateOnWriteStage 0 0 0
  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,832 StatusLogger.java (line 68) 
 GossipStage   0 0 0
  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,832 StatusLogger.java (line 68) 
 AntiEntropyStage  0 0 0
  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,832 StatusLogger.java (line 68) 
 MigrationStage0 0 0
  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,832 StatusLogger.java (line 68) 
 MemtablePostFlusher   0 0 0
  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,833 StatusLogger.java (line 68) 
 FlushWriter   0 0 0
  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,833 StatusLogger.java (line 68) 
 MiscStage 0 0 0
  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,833 StatusLogger.java (line 68) 
 commitlog_archiver0 0 0
  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,834 StatusLogger.java (line 68) 
 InternalResponseStage 0 0 0
  INFO 

Re: Advice on memory warning

2013-04-24 Thread Michael Theroux
Hello,

Just to wrap up on my part of this thread, tuning CMS compaction threshold 
(-XX:CMSInitiatingOccupancyFraction) to 70 appears to resolved my issues with 
the memory warnings.  However, I don't believe this would be a solution to all 
the issues mentioned below.  Although, it does make sense to me tune this value 
below the flush_largest_memtables_at value in cassandra.yaml so CMS 
compaction will kick in before we start flushing memtables to free memory.

Thanks!
-Mike

On Apr 23, 2013, at 12:47 PM, Haithem Jarraya wrote:

 We are facing similar issue, and we are not able to have the ring stable.  We 
 are using C*1.2.3 on Centos6, 32GB - RAM, 8GB-heap, 6 Nodes.
 The total data ~ 84gb (which is relatively small for C* to handle, with a RF 
 of 3).  Our application is heavy read, we see the GC complaints in all nodes, 
 I copied and past the output below.
 Also we usually see much larger values for the Pending - ReadStage, not sure 
 what is the best advice for this.
 
 Thanks,
 
 Haithem
  
 INFO [ScheduledTasks:1] 2013-04-23 16:40:02,118 GCInspector.java (line 119) 
 GC for ConcurrentMarkSweep: 911 ms for 1 collections, 5945542968 used; max is 
 8199471104
  INFO [ScheduledTasks:1] 2013-04-23 16:40:16,051 GCInspector.java (line 119) 
 GC for ConcurrentMarkSweep: 322 ms for 1 collections, 5639896576 used; max is 
 8199471104
  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,829 GCInspector.java (line 119) 
 GC for ConcurrentMarkSweep: 2273 ms for 1 collections, 6762618136 used; max 
 is 8199471104
  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,830 StatusLogger.java (line 53) 
 Pool NameActive   Pending   Blocked
  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,830 StatusLogger.java (line 68) 
 ReadStage 4 4 0
  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,831 StatusLogger.java (line 68) 
 RequestResponseStage  1 6 0
  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,831 StatusLogger.java (line 68) 
 ReadRepairStage   0 0 0
  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,831 StatusLogger.java (line 68) 
 MutationStage 0 0 0
  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,831 StatusLogger.java (line 68) 
 ReplicateOnWriteStage 0 0 0
  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,832 StatusLogger.java (line 68) 
 GossipStage   0 0 0
  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,832 StatusLogger.java (line 68) 
 AntiEntropyStage  0 0 0
  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,832 StatusLogger.java (line 68) 
 MigrationStage0 0 0
  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,832 StatusLogger.java (line 68) 
 MemtablePostFlusher   0 0 0
  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,833 StatusLogger.java (line 68) 
 FlushWriter   0 0 0
  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,833 StatusLogger.java (line 68) 
 MiscStage 0 0 0
  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,833 StatusLogger.java (line 68) 
 commitlog_archiver0 0 0
  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,834 StatusLogger.java (line 68) 
 InternalResponseStage 0 0 0
  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,834 StatusLogger.java (line 68) 
 AntiEntropySessions   0 0 0
  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,834 StatusLogger.java (line 68) 
 HintedHandoff 0 0 0
  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,843 StatusLogger.java (line 73) 
 CompactionManager 0 0
  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,844 StatusLogger.java (line 85) 
 MessagingServicen/a  15,1
  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,844 StatusLogger.java (line 95) 
 Cache Type Size Capacity   
 KeysToSave Provider
  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,844 StatusLogger.java (line 96) 
 KeyCache  251658064251658081  
 all 
  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,844 StatusLogger.java (line 102) 
 RowCache  00  
 all  org.apache.cassandra.cache.SerializingCacheProvider
  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,844 StatusLogger.java (line 109) 
 ColumnFamilyMemtable ops,data
  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,845 StatusLogger.java (line 112) 
 system.local  0,0
  INFO [ScheduledTasks:1] 2013-04-23 16:40:30,845 StatusLogger.java (line 112) 
 system.peers  0,0
  INFO 

Re: Advice on memory warning

2013-04-23 Thread Ralph Goers
We are using DSE, which I believe is also 1.1.9.  We have basically had a 
non-usable cluster for months due to this error.  In our case, once it starts 
doing this it starts flushing sstables to disk and eventually fills up the disk 
to the point where it can't compact.  If we catch it soon enough and restart 
the node it usually can recover.

In our case, the heap size is 12 GB. As I understand it Cassandra will give 1/3 
of that for sstables. I then noticed that we have one column family that is 
using nearly 4GB in bloom filters on each node.  Since the nodes will start 
doing this when the heap reaches 9GB we essentially only have 1GB of free 
memory so when compactions, cleanups, etc take place this situation starts 
happening.  We are working to change our data model to try to resolve this.

Ralph 

On Apr 19, 2013, at 8:00 AM, Michael Theroux wrote:

 Hello,
 
 We've recently upgraded from m1.large to m1.xlarge instances on AWS to handle 
 additional load, but to also relieve memory pressure.  It appears to have 
 accomplished both, however, we are still getting a warning, 0-3 times a day, 
 on our database nodes:
 
 WARN [ScheduledTasks:1] 2013-04-19 14:17:46,532 GCInspector.java (line 145) 
 Heap is 0.7529240824406468 full.  You may need to reduce memtable and/or 
 cache sizes.  Cassandra will now flush up to the two largest memtables to 
 free up memory.  Adjust flush_largest_memtables_at threshold in 
 cassandra.yaml if you don't want Cassandra to do this automatically
 
 This is happening much less frequently than before the upgrade, but after 
 essentially doubling the amount of available memory, I'm curious on what I 
 can do to determine what is happening during this time.  
 
 I am collecting all the JMX statistics.  Memtable space is elevated but not 
 extraordinarily high.  No GC messages are being output to the log.   
 
 These warnings do seem to be occurring doing compactions of column families 
 using LCS with wide rows, but I'm not sure there is a direct correlation.
 
 We are running Cassandra 1.1.9, with a maximum heap of 8G.  
 
 Any advice?
 Thanks,
 -Mike



RE: Advice on memory warning

2013-04-21 Thread moshe.kranc
My experience (running C* 1.2.2): 

1. I also observe that this occurs during compaction. 
2. I have never yet seen a node recover from this state. Once it starts 
complaining about heap, it starts a death spiral, i.e., futile attempts to fix 
the situation. Eventually the node starts running GC for so long that it looks 
down to the other nodes. 
3. Do you observe a lot of pending MemtablePostFlusher tasks in the log? I 
believe this is another symptom.

Like you, I would love to get advice on what causes this and how to avoid it.

-Original Message-
From: Michael Theroux [mailto:mthero...@yahoo.com] 
Sent: Friday, April 19, 2013 6:00 PM
To: user@cassandra.apache.org
Subject: Advice on memory warning

Hello,

We've recently upgraded from m1.large to m1.xlarge instances on AWS to handle 
additional load, but to also relieve memory pressure.  It appears to have 
accomplished both, however, we are still getting a warning, 0-3 times a day, on 
our database nodes:

WARN [ScheduledTasks:1] 2013-04-19 14:17:46,532 GCInspector.java (line 145) 
Heap is 0.7529240824406468 full.  You may need to reduce memtable and/or cache 
sizes.  Cassandra will now flush up to the two largest memtables to free up 
memory.  Adjust flush_largest_memtables_at threshold in cassandra.yaml if you 
don't want Cassandra to do this automatically

This is happening much less frequently than before the upgrade, but after 
essentially doubling the amount of available memory, I'm curious on what I can 
do to determine what is happening during this time.  

I am collecting all the JMX statistics.  Memtable space is elevated but not 
extraordinarily high.  No GC messages are being output to the log.   

These warnings do seem to be occurring doing compactions of column families 
using LCS with wide rows, but I'm not sure there is a direct correlation.

We are running Cassandra 1.1.9, with a maximum heap of 8G.  

Any advice?
Thanks,
-Mike
___

This message may contain information that is confidential or privileged. If you 
are not an intended recipient of this message, please delete it and any 
attachments, and notify the sender that you have received it in error. Unless 
specifically stated in the message or otherwise indicated, you may not 
duplicate, redistribute or forward this message or any portion thereof, 
including any attachments, by any means to any other person, including any 
retail investor or customer. This message is not a recommendation, advice, 
offer or solicitation, to buy/sell any product or service, and is not an 
official confirmation of any transaction. Any opinions presented are solely 
those of the author and do not necessarily represent those of Barclays.

This message is subject to terms available at: www.barclays.com/emaildisclaimer 
and, if received from Barclays' Sales or Trading desk, the terms available at: 
www.barclays.com/salesandtradingdisclaimer/. By messaging with Barclays you 
consent to the foregoing. Barclays Bank PLC is a company registered in England 
(number 1026167) with its registered office at 1 Churchill Place, London, E14 
5HP. This email may relate to or be sent from other members of the Barclays 
group.

___


Advice on memory warning

2013-04-19 Thread Michael Theroux
Hello,

We've recently upgraded from m1.large to m1.xlarge instances on AWS to handle 
additional load, but to also relieve memory pressure.  It appears to have 
accomplished both, however, we are still getting a warning, 0-3 times a day, on 
our database nodes:

WARN [ScheduledTasks:1] 2013-04-19 14:17:46,532 GCInspector.java (line 145) 
Heap is 0.7529240824406468 full.  You may need to reduce memtable and/or cache 
sizes.  Cassandra will now flush up to the two largest memtables to free up 
memory.  Adjust flush_largest_memtables_at threshold in cassandra.yaml if you 
don't want Cassandra to do this automatically

This is happening much less frequently than before the upgrade, but after 
essentially doubling the amount of available memory, I'm curious on what I can 
do to determine what is happening during this time.  

I am collecting all the JMX statistics.  Memtable space is elevated but not 
extraordinarily high.  No GC messages are being output to the log.   

These warnings do seem to be occurring doing compactions of column families 
using LCS with wide rows, but I'm not sure there is a direct correlation.

We are running Cassandra 1.1.9, with a maximum heap of 8G.  

Any advice?
Thanks,
-Mike