Re: Memory issue

2014-05-29 Thread Aaron Morton
   As soon as it starts, the JVM is get killed because of memory issue.
What is the memory issue that gets kills the JVM ? 

The log message below is simply a warning

 WARN [main] 2011-06-15 09:58:56,861 CLibrary.java (line 118) Unable to lock 
 JVM memory (ENOMEM).
 This can result in part of the JVM being swapped out, especially with mmapped 
 I/O enabled.
 Increase RLIMIT_MEMLOCK or run Cassandra as root.

Is there anything in the system logs ? 

Cheers
Aaron 
-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 24/05/2014, at 9:17 am, Robert Coli rc...@eventbrite.com wrote:

 On Fri, May 23, 2014 at 2:08 PM, opensaf dev opensaf...@gmail.com wrote:
 I have a different service which controls the cassandra service for high 
 availability.
 
 IMO, starting or stopping a Cassandra node should never be a side effect of 
 another system's properties. YMMV.
 
 https://issues.apache.org/jira/browse/CASSANDRA-2356
 
 For some related comments.
 
 =Rob
 



Re: Memory issue

2014-05-23 Thread Patricia Gorla
On Wed, May 21, 2014 at 12:59 AM, opensaf dev opensaf...@gmail.com wrote:

 When I run as user cassandra, it starts and runs fine.


Why do you want to run Cassandra as a different user?

-- 
Patricia Gorla
@patriciagorla

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com http://thelastpickle.com


Re: Memory issue

2014-05-23 Thread opensaf dev
I have a different service which controls the cassandra service for high
availability.

Thanks
Dev


On Fri, May 23, 2014 at 7:35 AM, Patricia Gorla
patri...@thelastpickle.comwrote:


 On Wed, May 21, 2014 at 12:59 AM, opensaf dev opensaf...@gmail.comwrote:

 When I run as user cassandra, it starts and runs fine.


 Why do you want to run Cassandra as a different user?

 --
 Patricia Gorla
 @patriciagorla

 Consultant
 Apache Cassandra Consulting
 http://www.thelastpickle.com http://thelastpickle.com



Re: Memory issue

2014-05-23 Thread Robert Coli
On Fri, May 23, 2014 at 2:08 PM, opensaf dev opensaf...@gmail.com wrote:

 I have a different service which controls the cassandra service for high
 availability.


IMO, starting or stopping a Cassandra node should never be a side effect of
another system's properties. YMMV.

https://issues.apache.org/jira/browse/CASSANDRA-2356

For some related comments.

=Rob


Re: Memory issue

2014-05-22 Thread opensaf dev
Well Romain, I had tried restarting the VM as well but problem still
remained.

What I noticed is after sometime irrespective I run cassandra from other
user or using the normal cassandra the problem still remains. As soon as it
starts, the JVM is get killed because of memory issue. Is there some other
settings other then limits.conf file I need to configure.

Also to note that I dont have a /etc/limits.d/cassandra.conf file. I just
configured them in limits.conf. Even though I made same the group ID of
both the users(cassanda, X) but no use. Is there anything like cassandra
has to be started under cassandra user only? What are the special
configurations required if we try to run cassandra under a different user?

Thanks
Dev






On Tue, May 20, 2014 at 10:44 PM, Romain HARDOUIN romain.hardo...@urssaf.fr
 wrote:

 Well... you have already changed the limits ;-)
 Keep in mind that changes in the limits.conf file will not affect
 processes that are already running.

 opensaf dev opensaf...@gmail.com a écrit sur 21/05/2014 06:59:05 :

  De : opensaf dev opensaf...@gmail.com
  A : user@cassandra.apache.org,
  Date : 21/05/2014 07:00
  Objet : Memory issue
 
  Hi guys,
 
  I am trying to run Cassandra on CentOS as an user X other then root
  or cassandra. When I run as user cassandra, it starts and runs fine.
  But, when I run under user X, I am getting the below error once
  cassandra started and system freezes totally.
 
  Insufficient memlock settings:
  WARN [main] 2011-06-15 09:58:56,861 CLibrary.java (line 118) Unable
  to lock JVM memory (ENOMEM).
  This can result in part of the JVM being swapped out, especially
  with mmapped I/O enabled.
  Increase RLIMIT_MEMLOCK or run Cassandra as root.
 
 
  I have tried the tips available online to change the memlock and
  other limits both for users cassadra and X, but did not solve the
 problem.
 

  What else I should consider when I run cassandra other then user
  cassandra/root.
 
 
  Any help is much appreciated.
 
 
  Thanks
  Dev
 



RE: Memory issue

2014-05-20 Thread Romain HARDOUIN
Hi,

You have to define limits for the user. 
Here is an example for the user cassandra:

# cat /etc/security/limits.d/cassandra.conf 
cassandra   -   memlock unlimited
cassandra   -   nofile  10

best,

Romain

opensaf dev opensaf...@gmail.com a écrit sur 21/05/2014 06:59:05 :

 De : opensaf dev opensaf...@gmail.com
 A : user@cassandra.apache.org, 
 Date : 21/05/2014 07:00
 Objet : Memory issue
 
 Hi guys,
 
 I am trying to run Cassandra on CentOS as an user X other then root 
 or cassandra. When I run as user cassandra, it starts and runs fine.
 But, when I run under user X, I am getting the below error once 
 cassandra started and system freezes totally.
 
 Insufficient memlock settings:
 WARN [main] 2011-06-15 09:58:56,861 CLibrary.java (line 118) Unable 
 to lock JVM memory (ENOMEM).
 This can result in part of the JVM being swapped out, especially 
 with mmapped I/O enabled.
 Increase RLIMIT_MEMLOCK or run Cassandra as root.
 
 
 I have tried the tips available online to change the memlock and 
 other limits both for users cassadra and X, but did not solve the 
problem.
 

 What else I should consider when I run cassandra other then user 
 cassandra/root.
 
 
 Any help is much appreciated.
 
 
 Thanks
 Dev
 


RE: Memory issue

2014-05-20 Thread Romain HARDOUIN
Well... you have already changed the limits ;-)
Keep in mind that changes in the limits.conf file will not affect 
processes that are already running.

opensaf dev opensaf...@gmail.com a écrit sur 21/05/2014 06:59:05 :

 De : opensaf dev opensaf...@gmail.com
 A : user@cassandra.apache.org, 
 Date : 21/05/2014 07:00
 Objet : Memory issue
 
 Hi guys,
 
 I am trying to run Cassandra on CentOS as an user X other then root 
 or cassandra. When I run as user cassandra, it starts and runs fine.
 But, when I run under user X, I am getting the below error once 
 cassandra started and system freezes totally.
 
 Insufficient memlock settings:
 WARN [main] 2011-06-15 09:58:56,861 CLibrary.java (line 118) Unable 
 to lock JVM memory (ENOMEM).
 This can result in part of the JVM being swapped out, especially 
 with mmapped I/O enabled.
 Increase RLIMIT_MEMLOCK or run Cassandra as root.
 
 
 I have tried the tips available online to change the memlock and 
 other limits both for users cassadra and X, but did not solve the 
problem.
 

 What else I should consider when I run cassandra other then user 
 cassandra/root.
 
 
 Any help is much appreciated.
 
 
 Thanks
 Dev
 


Re: memory issue on 1.1.0

2012-06-06 Thread aaron morton
Mina, 
That does not sound right. 

If you have the time can you create a jira ticket describing the 
problem, please include:

* the GC logs gathered by enabling them here 
https://github.com/apache/cassandra/blob/trunk/conf/cassandra-env.sh#L165 (It 
would be good to see the node get into trouble if possible).
* OS, JVM and cassandra versions
* information on the schema and workload
* anything else you think is important. 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 6/06/2012, at 7:24 AM, Mina Naguib wrote:

 
 Hi Wade
 
 I don't know if your scenario matches mine, but I've been struggling with 
 memory pressure in 1.x as well.  I made the jump from 0.7.9 to 1.1.0, along 
 with enabling compression and levelled compactions, so I don't know which 
 specifically is the main culprit.
 
 Specifically, all my nodes seem to lose heap memory.  As parnew and CMS do 
 their job, over any reasonable period of time, the floor of memory after a 
 GC keeps rising.  This is quite visible if you leave jconsole connected for a 
 day or so, and manifests itself as a funny-looking cone like so: 
 http://mina.naguib.ca/images/cassandra_jconsole.png
 
 Once memory pressure reaches a point where the heap can't be maintained 
 reliably below 75%, cassandra goes into survival mode - via a bunch of 
 tunables in cassandra.conf it'll do things like flush memtables, drop caches, 
 etc - all of which, in my experience, especially with the recent off-heap 
 data structures, exasperate the problem.
 
 I've been meaning, of course, to collect enough technical data to file a bug 
 report, but haven't had the time.  I have not yet tested 1.1.1 to see if it 
 improves the situation.
 
 What I have found however, is a band-aid which you see at the rightmost 
 section of the graph in the screenshot I posted.  That is simply to hit 
 Perform GC button in jconsole.  It seems that a full System.gc() *DOES* 
 reclaim heap memory that parnew and CMS fail to reclaim.
 
 On my production cluster I have a full-GC via JMX scheduled in a rolling 
 fashion every 4 hours.  It's extremely expensive (20-40 seconds of 
 unresponsiveness) but is a necessary evil in my situation.  Without it, my 
 nodes enter a nasty spiral of constant flushing, constant compactions, high 
 heap usage, instability and high latency.
 
 
 On 2012-06-05, at 2:56 PM, Poziombka, Wade L wrote:
 
 Alas, upgrading to 1.1.1 did not solve my issue.
 
 -Original Message-
 From: Brandon Williams [mailto:dri...@gmail.com] 
 Sent: Monday, June 04, 2012 11:24 PM
 To: user@cassandra.apache.org
 Subject: Re: memory issue on 1.1.0
 
 Perhaps the deletes: https://issues.apache.org/jira/browse/CASSANDRA-3741
 
 -Brandon
 
 On Sun, Jun 3, 2012 at 6:12 PM, Poziombka, Wade L 
 wade.l.poziom...@intel.com wrote:
 Running a very write intensive (new column, delete old column etc.) process 
 and failing on memory.  Log file attached.
 
 Curiously when I add new data I have never seen this have in past sent 
 hundreds of millions new transactions.  It seems to be when I 
 modify.  my process is as follows
 
 key slice to get columns to modify in batches of 100, in separate threads 
 modify those columns.  I advance the slice with the start key each with 
 last key in previous batch.  Mutations done are update a column value in 
 one column family(token), delete column and add new column in another (pan).
 
 Runs well until after about 5 million rows then it seems to run out of 
 memory.  Note that these column families are quite small.
 
 WARN [ScheduledTasks:1] 2012-06-03 17:49:01,558 GCInspector.java (line 
 145) Heap is 0.7967470834946492 full.  You may need to reduce memtable 
 and/or cache sizes.  Cassandra will now flush up to the two largest 
 memtables to free up memory.  Adjust flush_largest_memtables_at 
 threshold in cassandra.yaml if you don't want Cassandra to do this 
 automatically
 INFO [ScheduledTasks:1] 2012-06-03 17:49:01,559 StorageService.java 
 (line 2772) Unable to reduce heap usage since there are no dirty 
 column families
 INFO [GossipStage:1] 2012-06-03 17:49:01,999 Gossiper.java (line 797) 
 InetAddress /10.230.34.170 is now UP
 INFO [ScheduledTasks:1] 2012-06-03 17:49:10,048 GCInspector.java 
 (line 122) GC for ParNew: 206 ms for 1 collections, 7345969520 used; 
 max is 8506048512
 INFO [ScheduledTasks:1] 2012-06-03 17:49:53,187 GCInspector.java 
 (line 122) GC for ConcurrentMarkSweep: 12770 ms for 1 collections, 
 5714800208 used; max is 8506048512
 
 
 Keyspace: keyspace
   Read Count: 50042632
   Read Latency: 0.23157864418482224 ms.
   Write Count: 44948323
   Write Latency: 0.019460829472992797 ms.
   Pending Tasks: 0
   Column Family: pan
   SSTable count: 5
   Space used (live): 1977467326
   Space used (total): 1977467326
   Number of Keys (estimate): 16334848
   Memtable

Re: memory issue on 1.1.0

2012-06-06 Thread aaron morton
I looked through the log again. Still looks like it's overloaded and not 
handling the overload very well. 

It looks like a sustained write load of around 280K columns every 5 minutes for 
about 5 hours. It may be that the CPU is the bottle neck when it comes to GC 
throughput. You are hitting ParNew issues from the very start, and end up with 
20 second CMS. Do you see high CPU load ? 

Can you enable the GC logging options in cassandra-env.sh ? 
 
Can you throttle back the test and to a level where the server does not fail ? 

Alternatively can you dump the heap when it get's full and see what it taking 
up all the space ?

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 6/06/2012, at 2:12 PM, Poziombka, Wade L wrote:

 Ok, so I have completely refactored to remove deletes and it still fails. So 
 it is completely unrelated to deletes.
 
 I guess I need to go back to 1.0.10?  When I originally evaluated I ran 
 1.0.8... perhaps I went a bridge too far with 1.1.
 
 I don't think I am doing anything exotic here.
 
 Here is my column family.  
 
 KsDef(name:TB_UNIT, 
 strategy_class:org.apache.cassandra.locator.SimpleStrategy, 
 strategy_options:{replication_factor=3}, 
 cf_defs:[
 
 CfDef(keyspace:TB_UNIT, name:token, column_type:Standard, 
 comparator_type:BytesType, column_metadata:[ColumnDef(name:70 61 6E 45 6E 63, 
 validation_class:BytesType), ColumnDef(name:63 72 65 61 74 65 54 73, 
 validation_class:DateType), ColumnDef(name:63 72 65 61 74 65 44 61 74 65, 
 validation_class:DateType, index_type:KEYS, index_name:TokenCreateDate), 
 ColumnDef(name:65 6E 63 72 79 70 74 69 6F 6E 53 65 74 74 69 6E 67 73 49 44, 
 validation_class:UTF8Type, index_type:KEYS, 
 index_name:EncryptionSettingsID)], caching:keys_only), 
 
 CfDef(keyspace:TB_UNIT, name:pan_d721fd40fd9443aa81cc6f59c8e047c6, 
 column_type:Standard, comparator_type:BytesType, caching:keys_only), 
 
 CfDef(keyspace:TB_UNIT, name:counters, column_type:Standard, 
 comparator_type:BytesType, column_metadata:[ColumnDef(name:75 73 65 43 6F 75 
 6E 74, validation_class:CounterColumnType)], 
 default_validation_class:CounterColumnType, caching:keys_only)
 
 ])
 
 
 -Original Message-
 From: Poziombka, Wade L [mailto:wade.l.poziom...@intel.com] 
 Sent: Tuesday, June 05, 2012 3:09 PM
 To: user@cassandra.apache.org
 Subject: RE: memory issue on 1.1.0
 
 Thank you.  I do have some of the same observations.  Do you do deletes?
 
 My observation is that without deletes (or column updates I guess) I can run 
 forever happy.  but when I run (what for me is a batch process) operations 
 that delete and modify column values I run into this.
 
 Reading bug https://issues.apache.org/jira/browse/CASSANDRA-3741 the advice 
 is to NOT do deletes individually and to truncate.  I am scrambling to try to 
 do this but curious if it will be worth the effort.
 
 Wade
 
 -Original Message-
 From: Mina Naguib [mailto:mina.nag...@bloomdigital.com] 
 Sent: Tuesday, June 05, 2012 2:24 PM
 To: user@cassandra.apache.org
 Subject: Re: memory issue on 1.1.0
 
 
 Hi Wade
 
 I don't know if your scenario matches mine, but I've been struggling with 
 memory pressure in 1.x as well.  I made the jump from 0.7.9 to 1.1.0, along 
 with enabling compression and levelled compactions, so I don't know which 
 specifically is the main culprit.
 
 Specifically, all my nodes seem to lose heap memory.  As parnew and CMS do 
 their job, over any reasonable period of time, the floor of memory after a 
 GC keeps rising.  This is quite visible if you leave jconsole connected for a 
 day or so, and manifests itself as a funny-looking cone like so: 
 http://mina.naguib.ca/images/cassandra_jconsole.png
 
 Once memory pressure reaches a point where the heap can't be maintained 
 reliably below 75%, cassandra goes into survival mode - via a bunch of 
 tunables in cassandra.conf it'll do things like flush memtables, drop caches, 
 etc - all of which, in my experience, especially with the recent off-heap 
 data structures, exasperate the problem.
 
 I've been meaning, of course, to collect enough technical data to file a bug 
 report, but haven't had the time.  I have not yet tested 1.1.1 to see if it 
 improves the situation.
 
 What I have found however, is a band-aid which you see at the rightmost 
 section of the graph in the screenshot I posted.  That is simply to hit 
 Perform GC button in jconsole.  It seems that a full System.gc() *DOES* 
 reclaim heap memory that parnew and CMS fail to reclaim.
 
 On my production cluster I have a full-GC via JMX scheduled in a rolling 
 fashion every 4 hours.  It's extremely expensive (20-40 seconds of 
 unresponsiveness) but is a necessary evil in my situation.  Without it, my 
 nodes enter a nasty spiral of constant flushing, constant compactions, high 
 heap usage, instability and high latency.
 
 
 On 2012-06-05, at 2:56 PM, Poziombka, Wade L wrote:
 
 Alas, upgrading to 1.1.1 did not solve

Re: memory issue on 1.1.0

2012-06-06 Thread Tyler Hobbs
Just to check, do you have JNA setup correctly? (You should see a couple of
log messages about it shortly after startup.)  Truncate also performs a
snapshot by default.

On Wed, Jun 6, 2012 at 12:38 PM, Poziombka, Wade L 
wade.l.poziom...@intel.com wrote:

 **

 However, after all the work I issued a truncate on the old column family
 (the one replaced by this process) and I get an out of memory condition
 then.




-- 
Tyler Hobbs
DataStax http://datastax.com/


RE: memory issue on 1.1.0

2012-06-06 Thread Poziombka, Wade L
I believe so.  There are no warnings on startup.

So is there a preferred way to completely eliminate a column family?

From: Tyler Hobbs [mailto:ty...@datastax.com]
Sent: Wednesday, June 06, 2012 1:17 PM
To: user@cassandra.apache.org
Subject: Re: memory issue on 1.1.0

Just to check, do you have JNA setup correctly? (You should see a couple of log 
messages about it shortly after startup.)  Truncate also performs a snapshot by 
default.
On Wed, Jun 6, 2012 at 12:38 PM, Poziombka, Wade L 
wade.l.poziom...@intel.commailto:wade.l.poziom...@intel.com wrote:
However, after all the work I issued a truncate on the old column family (the 
one replaced by this process) and I get an out of memory condition then.



--
Tyler Hobbs
DataStaxhttp://datastax.com/


Re: memory issue on 1.1.0

2012-06-06 Thread aaron morton
use drop. 

truncate is mostly for unit tests.
A
-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 7/06/2012, at 6:22 AM, Poziombka, Wade L wrote:

 I believe so.  There are no warnings on startup. 
  
 So is there a preferred way to completely eliminate a column family?
  
 From: Tyler Hobbs [mailto:ty...@datastax.com] 
 Sent: Wednesday, June 06, 2012 1:17 PM
 To: user@cassandra.apache.org
 Subject: Re: memory issue on 1.1.0
  
 Just to check, do you have JNA setup correctly? (You should see a couple of 
 log messages about it shortly after startup.)  Truncate also performs a 
 snapshot by default.
 
 On Wed, Jun 6, 2012 at 12:38 PM, Poziombka, Wade L 
 wade.l.poziom...@intel.com wrote:
 However, after all the work I issued a truncate on the old column family (the 
 one replaced by this process) and I get an out of memory condition then.
 
 
 
 -- 
 Tyler Hobbs
 DataStax
 



RE: memory issue on 1.1.0

2012-06-05 Thread Poziombka, Wade L
Alas, upgrading to 1.1.1 did not solve my issue.

-Original Message-
From: Brandon Williams [mailto:dri...@gmail.com] 
Sent: Monday, June 04, 2012 11:24 PM
To: user@cassandra.apache.org
Subject: Re: memory issue on 1.1.0

Perhaps the deletes: https://issues.apache.org/jira/browse/CASSANDRA-3741

-Brandon

On Sun, Jun 3, 2012 at 6:12 PM, Poziombka, Wade L wade.l.poziom...@intel.com 
wrote:
 Running a very write intensive (new column, delete old column etc.) process 
 and failing on memory.  Log file attached.

 Curiously when I add new data I have never seen this have in past sent 
 hundreds of millions new transactions.  It seems to be when I 
 modify.  my process is as follows

 key slice to get columns to modify in batches of 100, in separate threads 
 modify those columns.  I advance the slice with the start key each with last 
 key in previous batch.  Mutations done are update a column value in one 
 column family(token), delete column and add new column in another (pan).

 Runs well until after about 5 million rows then it seems to run out of 
 memory.  Note that these column families are quite small.

 WARN [ScheduledTasks:1] 2012-06-03 17:49:01,558 GCInspector.java (line 
 145) Heap is 0.7967470834946492 full.  You may need to reduce memtable 
 and/or cache sizes.  Cassandra will now flush up to the two largest 
 memtables to free up memory.  Adjust flush_largest_memtables_at 
 threshold in cassandra.yaml if you don't want Cassandra to do this 
 automatically
  INFO [ScheduledTasks:1] 2012-06-03 17:49:01,559 StorageService.java 
 (line 2772) Unable to reduce heap usage since there are no dirty 
 column families
  INFO [GossipStage:1] 2012-06-03 17:49:01,999 Gossiper.java (line 797) 
 InetAddress /10.230.34.170 is now UP
  INFO [ScheduledTasks:1] 2012-06-03 17:49:10,048 GCInspector.java 
 (line 122) GC for ParNew: 206 ms for 1 collections, 7345969520 used; 
 max is 8506048512
  INFO [ScheduledTasks:1] 2012-06-03 17:49:53,187 GCInspector.java 
 (line 122) GC for ConcurrentMarkSweep: 12770 ms for 1 collections, 
 5714800208 used; max is 8506048512

 
 Keyspace: keyspace
        Read Count: 50042632
        Read Latency: 0.23157864418482224 ms.
        Write Count: 44948323
        Write Latency: 0.019460829472992797 ms.
        Pending Tasks: 0
                Column Family: pan
                SSTable count: 5
                Space used (live): 1977467326
                Space used (total): 1977467326
                Number of Keys (estimate): 16334848
                Memtable Columns Count: 0
                Memtable Data Size: 0
                Memtable Switch Count: 74
                Read Count: 14985122
                Read Latency: 0.408 ms.
                Write Count: 19972441
                Write Latency: 0.022 ms.
                Pending Tasks: 0
                Bloom Filter False Postives: 829
                Bloom Filter False Ratio: 0.00073
                Bloom Filter Space Used: 37048400
                Compacted row minimum size: 125
                Compacted row maximum size: 149
                Compacted row mean size: 149

                Column Family: token
                SSTable count: 4
                Space used (live): 1250973873
                Space used (total): 1250973873
                Number of Keys (estimate): 14217216
                Memtable Columns Count: 0
                Memtable Data Size: 0
                Memtable Switch Count: 49
                Read Count: 30059563
                Read Latency: 0.167 ms.
                Write Count: 14985488
                Write Latency: 0.014 ms.
                Pending Tasks: 0
                Bloom Filter False Postives: 13642
                Bloom Filter False Ratio: 0.00322
                Bloom Filter Space Used: 28002984
                Compacted row minimum size: 150
                Compacted row maximum size: 258
                Compacted row mean size: 224

                Column Family: counters
                SSTable count: 2
                Space used (live): 561549994
                Space used (total): 561549994
                Number of Keys (estimate): 9985024
                Memtable Columns Count: 0
                Memtable Data Size: 0
                Memtable Switch Count: 38
                Read Count: 4997947
                Read Latency: 0.092 ms.
                Write Count: 9990394
                Write Latency: 0.023 ms.
                Pending Tasks: 0
                Bloom Filter False Postives: 191
                Bloom Filter False Ratio: 0.37525
                Bloom Filter Space Used: 18741152
                Compacted row minimum size: 125
                Compacted row maximum size: 179
                Compacted row mean size: 150

 


Re: memory issue on 1.1.0

2012-06-05 Thread Mina Naguib

Hi Wade

I don't know if your scenario matches mine, but I've been struggling with 
memory pressure in 1.x as well.  I made the jump from 0.7.9 to 1.1.0, along 
with enabling compression and levelled compactions, so I don't know which 
specifically is the main culprit.

Specifically, all my nodes seem to lose heap memory.  As parnew and CMS do 
their job, over any reasonable period of time, the floor of memory after a GC 
keeps rising.  This is quite visible if you leave jconsole connected for a day 
or so, and manifests itself as a funny-looking cone like so: 
http://mina.naguib.ca/images/cassandra_jconsole.png

Once memory pressure reaches a point where the heap can't be maintained 
reliably below 75%, cassandra goes into survival mode - via a bunch of tunables 
in cassandra.conf it'll do things like flush memtables, drop caches, etc - all 
of which, in my experience, especially with the recent off-heap data 
structures, exasperate the problem.

I've been meaning, of course, to collect enough technical data to file a bug 
report, but haven't had the time.  I have not yet tested 1.1.1 to see if it 
improves the situation.

What I have found however, is a band-aid which you see at the rightmost section 
of the graph in the screenshot I posted.  That is simply to hit Perform GC 
button in jconsole.  It seems that a full System.gc() *DOES* reclaim heap 
memory that parnew and CMS fail to reclaim.

On my production cluster I have a full-GC via JMX scheduled in a rolling 
fashion every 4 hours.  It's extremely expensive (20-40 seconds of 
unresponsiveness) but is a necessary evil in my situation.  Without it, my 
nodes enter a nasty spiral of constant flushing, constant compactions, high 
heap usage, instability and high latency.


On 2012-06-05, at 2:56 PM, Poziombka, Wade L wrote:

 Alas, upgrading to 1.1.1 did not solve my issue.
 
 -Original Message-
 From: Brandon Williams [mailto:dri...@gmail.com] 
 Sent: Monday, June 04, 2012 11:24 PM
 To: user@cassandra.apache.org
 Subject: Re: memory issue on 1.1.0
 
 Perhaps the deletes: https://issues.apache.org/jira/browse/CASSANDRA-3741
 
 -Brandon
 
 On Sun, Jun 3, 2012 at 6:12 PM, Poziombka, Wade L 
 wade.l.poziom...@intel.com wrote:
 Running a very write intensive (new column, delete old column etc.) process 
 and failing on memory.  Log file attached.
 
 Curiously when I add new data I have never seen this have in past sent 
 hundreds of millions new transactions.  It seems to be when I 
 modify.  my process is as follows
 
 key slice to get columns to modify in batches of 100, in separate threads 
 modify those columns.  I advance the slice with the start key each with last 
 key in previous batch.  Mutations done are update a column value in one 
 column family(token), delete column and add new column in another (pan).
 
 Runs well until after about 5 million rows then it seems to run out of 
 memory.  Note that these column families are quite small.
 
 WARN [ScheduledTasks:1] 2012-06-03 17:49:01,558 GCInspector.java (line 
 145) Heap is 0.7967470834946492 full.  You may need to reduce memtable 
 and/or cache sizes.  Cassandra will now flush up to the two largest 
 memtables to free up memory.  Adjust flush_largest_memtables_at 
 threshold in cassandra.yaml if you don't want Cassandra to do this 
 automatically
  INFO [ScheduledTasks:1] 2012-06-03 17:49:01,559 StorageService.java 
 (line 2772) Unable to reduce heap usage since there are no dirty 
 column families
  INFO [GossipStage:1] 2012-06-03 17:49:01,999 Gossiper.java (line 797) 
 InetAddress /10.230.34.170 is now UP
  INFO [ScheduledTasks:1] 2012-06-03 17:49:10,048 GCInspector.java 
 (line 122) GC for ParNew: 206 ms for 1 collections, 7345969520 used; 
 max is 8506048512
  INFO [ScheduledTasks:1] 2012-06-03 17:49:53,187 GCInspector.java 
 (line 122) GC for ConcurrentMarkSweep: 12770 ms for 1 collections, 
 5714800208 used; max is 8506048512
 
 
 Keyspace: keyspace
Read Count: 50042632
Read Latency: 0.23157864418482224 ms.
Write Count: 44948323
Write Latency: 0.019460829472992797 ms.
Pending Tasks: 0
Column Family: pan
SSTable count: 5
Space used (live): 1977467326
Space used (total): 1977467326
Number of Keys (estimate): 16334848
Memtable Columns Count: 0
Memtable Data Size: 0
Memtable Switch Count: 74
Read Count: 14985122
Read Latency: 0.408 ms.
Write Count: 19972441
Write Latency: 0.022 ms.
Pending Tasks: 0
Bloom Filter False Postives: 829
Bloom Filter False Ratio: 0.00073
Bloom Filter Space Used: 37048400
Compacted row minimum size: 125
Compacted row maximum size: 149
Compacted row mean size: 149
 
Column Family: token

RE: memory issue on 1.1.0

2012-06-05 Thread Poziombka, Wade L
Thank you.  I do have some of the same observations.  Do you do deletes?

My observation is that without deletes (or column updates I guess) I can run 
forever happy.  but when I run (what for me is a batch process) operations that 
delete and modify column values I run into this.

Reading bug https://issues.apache.org/jira/browse/CASSANDRA-3741 the advice is 
to NOT do deletes individually and to truncate.  I am scrambling to try to do 
this but curious if it will be worth the effort.

Wade

-Original Message-
From: Mina Naguib [mailto:mina.nag...@bloomdigital.com] 
Sent: Tuesday, June 05, 2012 2:24 PM
To: user@cassandra.apache.org
Subject: Re: memory issue on 1.1.0


Hi Wade

I don't know if your scenario matches mine, but I've been struggling with 
memory pressure in 1.x as well.  I made the jump from 0.7.9 to 1.1.0, along 
with enabling compression and levelled compactions, so I don't know which 
specifically is the main culprit.

Specifically, all my nodes seem to lose heap memory.  As parnew and CMS do 
their job, over any reasonable period of time, the floor of memory after a GC 
keeps rising.  This is quite visible if you leave jconsole connected for a day 
or so, and manifests itself as a funny-looking cone like so: 
http://mina.naguib.ca/images/cassandra_jconsole.png

Once memory pressure reaches a point where the heap can't be maintained 
reliably below 75%, cassandra goes into survival mode - via a bunch of tunables 
in cassandra.conf it'll do things like flush memtables, drop caches, etc - all 
of which, in my experience, especially with the recent off-heap data 
structures, exasperate the problem.

I've been meaning, of course, to collect enough technical data to file a bug 
report, but haven't had the time.  I have not yet tested 1.1.1 to see if it 
improves the situation.

What I have found however, is a band-aid which you see at the rightmost section 
of the graph in the screenshot I posted.  That is simply to hit Perform GC 
button in jconsole.  It seems that a full System.gc() *DOES* reclaim heap 
memory that parnew and CMS fail to reclaim.

On my production cluster I have a full-GC via JMX scheduled in a rolling 
fashion every 4 hours.  It's extremely expensive (20-40 seconds of 
unresponsiveness) but is a necessary evil in my situation.  Without it, my 
nodes enter a nasty spiral of constant flushing, constant compactions, high 
heap usage, instability and high latency.


On 2012-06-05, at 2:56 PM, Poziombka, Wade L wrote:

 Alas, upgrading to 1.1.1 did not solve my issue.
 
 -Original Message-
 From: Brandon Williams [mailto:dri...@gmail.com]
 Sent: Monday, June 04, 2012 11:24 PM
 To: user@cassandra.apache.org
 Subject: Re: memory issue on 1.1.0
 
 Perhaps the deletes: 
 https://issues.apache.org/jira/browse/CASSANDRA-3741
 
 -Brandon
 
 On Sun, Jun 3, 2012 at 6:12 PM, Poziombka, Wade L 
 wade.l.poziom...@intel.com wrote:
 Running a very write intensive (new column, delete old column etc.) process 
 and failing on memory.  Log file attached.
 
 Curiously when I add new data I have never seen this have in past 
 sent hundreds of millions new transactions.  It seems to be when I 
 modify.  my process is as follows
 
 key slice to get columns to modify in batches of 100, in separate threads 
 modify those columns.  I advance the slice with the start key each with last 
 key in previous batch.  Mutations done are update a column value in one 
 column family(token), delete column and add new column in another (pan).
 
 Runs well until after about 5 million rows then it seems to run out of 
 memory.  Note that these column families are quite small.
 
 WARN [ScheduledTasks:1] 2012-06-03 17:49:01,558 GCInspector.java 
 (line
 145) Heap is 0.7967470834946492 full.  You may need to reduce 
 memtable and/or cache sizes.  Cassandra will now flush up to the two 
 largest memtables to free up memory.  Adjust 
 flush_largest_memtables_at threshold in cassandra.yaml if you don't 
 want Cassandra to do this automatically  INFO [ScheduledTasks:1] 
 2012-06-03 17:49:01,559 StorageService.java (line 2772) Unable to 
 reduce heap usage since there are no dirty column families  INFO 
 [GossipStage:1] 2012-06-03 17:49:01,999 Gossiper.java (line 797) 
 InetAddress /10.230.34.170 is now UP  INFO [ScheduledTasks:1] 
 2012-06-03 17:49:10,048 GCInspector.java (line 122) GC for ParNew: 
 206 ms for 1 collections, 7345969520 used; max is 8506048512  INFO 
 [ScheduledTasks:1] 2012-06-03 17:49:53,187 GCInspector.java (line 
 122) GC for ConcurrentMarkSweep: 12770 ms for 1 collections,
 5714800208 used; max is 8506048512
 
 
 Keyspace: keyspace
Read Count: 50042632
Read Latency: 0.23157864418482224 ms.
Write Count: 44948323
Write Latency: 0.019460829472992797 ms.
Pending Tasks: 0
Column Family: pan
SSTable count: 5
Space used (live): 1977467326
Space used (total): 1977467326

RE: memory issue on 1.1.0

2012-06-05 Thread Poziombka, Wade L
Ok, so I have completely refactored to remove deletes and it still fails. So it 
is completely unrelated to deletes.

I guess I need to go back to 1.0.10?  When I originally evaluated I ran 
1.0.8... perhaps I went a bridge too far with 1.1.

I don't think I am doing anything exotic here.

Here is my column family.  

KsDef(name:TB_UNIT, strategy_class:org.apache.cassandra.locator.SimpleStrategy, 
strategy_options:{replication_factor=3}, 
cf_defs:[

CfDef(keyspace:TB_UNIT, name:token, column_type:Standard, 
comparator_type:BytesType, column_metadata:[ColumnDef(name:70 61 6E 45 6E 63, 
validation_class:BytesType), ColumnDef(name:63 72 65 61 74 65 54 73, 
validation_class:DateType), ColumnDef(name:63 72 65 61 74 65 44 61 74 65, 
validation_class:DateType, index_type:KEYS, index_name:TokenCreateDate), 
ColumnDef(name:65 6E 63 72 79 70 74 69 6F 6E 53 65 74 74 69 6E 67 73 49 44, 
validation_class:UTF8Type, index_type:KEYS, index_name:EncryptionSettingsID)], 
caching:keys_only), 

CfDef(keyspace:TB_UNIT, name:pan_d721fd40fd9443aa81cc6f59c8e047c6, 
column_type:Standard, comparator_type:BytesType, caching:keys_only), 

CfDef(keyspace:TB_UNIT, name:counters, column_type:Standard, 
comparator_type:BytesType, column_metadata:[ColumnDef(name:75 73 65 43 6F 75 6E 
74, validation_class:CounterColumnType)], 
default_validation_class:CounterColumnType, caching:keys_only)

])


-Original Message-
From: Poziombka, Wade L [mailto:wade.l.poziom...@intel.com] 
Sent: Tuesday, June 05, 2012 3:09 PM
To: user@cassandra.apache.org
Subject: RE: memory issue on 1.1.0

Thank you.  I do have some of the same observations.  Do you do deletes?

My observation is that without deletes (or column updates I guess) I can run 
forever happy.  but when I run (what for me is a batch process) operations that 
delete and modify column values I run into this.

Reading bug https://issues.apache.org/jira/browse/CASSANDRA-3741 the advice is 
to NOT do deletes individually and to truncate.  I am scrambling to try to do 
this but curious if it will be worth the effort.

Wade

-Original Message-
From: Mina Naguib [mailto:mina.nag...@bloomdigital.com] 
Sent: Tuesday, June 05, 2012 2:24 PM
To: user@cassandra.apache.org
Subject: Re: memory issue on 1.1.0


Hi Wade

I don't know if your scenario matches mine, but I've been struggling with 
memory pressure in 1.x as well.  I made the jump from 0.7.9 to 1.1.0, along 
with enabling compression and levelled compactions, so I don't know which 
specifically is the main culprit.

Specifically, all my nodes seem to lose heap memory.  As parnew and CMS do 
their job, over any reasonable period of time, the floor of memory after a GC 
keeps rising.  This is quite visible if you leave jconsole connected for a day 
or so, and manifests itself as a funny-looking cone like so: 
http://mina.naguib.ca/images/cassandra_jconsole.png

Once memory pressure reaches a point where the heap can't be maintained 
reliably below 75%, cassandra goes into survival mode - via a bunch of tunables 
in cassandra.conf it'll do things like flush memtables, drop caches, etc - all 
of which, in my experience, especially with the recent off-heap data 
structures, exasperate the problem.

I've been meaning, of course, to collect enough technical data to file a bug 
report, but haven't had the time.  I have not yet tested 1.1.1 to see if it 
improves the situation.

What I have found however, is a band-aid which you see at the rightmost section 
of the graph in the screenshot I posted.  That is simply to hit Perform GC 
button in jconsole.  It seems that a full System.gc() *DOES* reclaim heap 
memory that parnew and CMS fail to reclaim.

On my production cluster I have a full-GC via JMX scheduled in a rolling 
fashion every 4 hours.  It's extremely expensive (20-40 seconds of 
unresponsiveness) but is a necessary evil in my situation.  Without it, my 
nodes enter a nasty spiral of constant flushing, constant compactions, high 
heap usage, instability and high latency.


On 2012-06-05, at 2:56 PM, Poziombka, Wade L wrote:

 Alas, upgrading to 1.1.1 did not solve my issue.
 
 -Original Message-
 From: Brandon Williams [mailto:dri...@gmail.com]
 Sent: Monday, June 04, 2012 11:24 PM
 To: user@cassandra.apache.org
 Subject: Re: memory issue on 1.1.0
 
 Perhaps the deletes: 
 https://issues.apache.org/jira/browse/CASSANDRA-3741
 
 -Brandon
 
 On Sun, Jun 3, 2012 at 6:12 PM, Poziombka, Wade L 
 wade.l.poziom...@intel.com wrote:
 Running a very write intensive (new column, delete old column etc.) process 
 and failing on memory.  Log file attached.
 
 Curiously when I add new data I have never seen this have in past 
 sent hundreds of millions new transactions.  It seems to be when I 
 modify.  my process is as follows
 
 key slice to get columns to modify in batches of 100, in separate threads 
 modify those columns.  I advance the slice with the start key each with last 
 key in previous batch.  Mutations done are update

Re: memory issue on 1.1.0

2012-06-04 Thread aaron morton
Had a look at the log, this message 

 INFO [ScheduledTasks:1] 2012-06-03 17:49:01,559 StorageService.java (line 
 2772) Unable to reduce heap usage since there are no dirty column families
appears correct, it happens after some flush activity and there are not CF's 
with memtable data. But the heap is still full. 

Overall the server is overloaded, but it seems like it should be handling it 
better. 

What JVM settings do you have? What is the machine spec ? 
What settings do you have for key and row cache ? 
Do the CF's have secondary indexes ?
How many clients / requests per second ? 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 4/06/2012, at 11:12 AM, Poziombka, Wade L wrote:

 Running a very write intensive (new column, delete old column etc.) process 
 and failing on memory.  Log file attached.
 
 Curiously when I add new data I have never seen this have in past sent 
 hundreds of millions new transactions.  It seems to be when I modify.  my 
 process is as follows
 
 key slice to get columns to modify in batches of 100, in separate threads 
 modify those columns.  I advance the slice with the start key each with last 
 key in previous batch.  Mutations done are update a column value in one 
 column family(token), delete column and add new column in another (pan).
 
 Runs well until after about 5 million rows then it seems to run out of 
 memory.  Note that these column families are quite small.
 
 WARN [ScheduledTasks:1] 2012-06-03 17:49:01,558 GCInspector.java (line 145) 
 Heap is 0.7967470834946492 full.  You may need to reduce memtable and/or 
 cache sizes.  Cassandra will now flush up to the two largest memtables to 
 free up memory.  Adjust flush_largest_memtables_at threshold in 
 cassandra.yaml if you don't want Cassandra to do this automatically
 INFO [ScheduledTasks:1] 2012-06-03 17:49:01,559 StorageService.java (line 
 2772) Unable to reduce heap usage since there are no dirty column families
 INFO [GossipStage:1] 2012-06-03 17:49:01,999 Gossiper.java (line 797) 
 InetAddress /10.230.34.170 is now UP
 INFO [ScheduledTasks:1] 2012-06-03 17:49:10,048 GCInspector.java (line 122) 
 GC for ParNew: 206 ms for 1 collections, 7345969520 used; max is 8506048512
 INFO [ScheduledTasks:1] 2012-06-03 17:49:53,187 GCInspector.java (line 122) 
 GC for ConcurrentMarkSweep: 12770 ms for 1 collections, 5714800208 used; max 
 is 8506048512
 
 
 Keyspace: keyspace
Read Count: 50042632
Read Latency: 0.23157864418482224 ms.
Write Count: 44948323
Write Latency: 0.019460829472992797 ms.
Pending Tasks: 0
Column Family: pan
SSTable count: 5
Space used (live): 1977467326
Space used (total): 1977467326
Number of Keys (estimate): 16334848
Memtable Columns Count: 0
Memtable Data Size: 0
Memtable Switch Count: 74
Read Count: 14985122
Read Latency: 0.408 ms.
Write Count: 19972441
Write Latency: 0.022 ms.
Pending Tasks: 0
Bloom Filter False Postives: 829
Bloom Filter False Ratio: 0.00073
Bloom Filter Space Used: 37048400
Compacted row minimum size: 125
Compacted row maximum size: 149
Compacted row mean size: 149
 
Column Family: token
SSTable count: 4
Space used (live): 1250973873
Space used (total): 1250973873
Number of Keys (estimate): 14217216
Memtable Columns Count: 0
Memtable Data Size: 0
Memtable Switch Count: 49
Read Count: 30059563
Read Latency: 0.167 ms.
Write Count: 14985488
Write Latency: 0.014 ms.
Pending Tasks: 0
Bloom Filter False Postives: 13642
Bloom Filter False Ratio: 0.00322
Bloom Filter Space Used: 28002984
Compacted row minimum size: 150
Compacted row maximum size: 258
Compacted row mean size: 224
 
Column Family: counters
SSTable count: 2
Space used (live): 561549994
Space used (total): 561549994
Number of Keys (estimate): 9985024
Memtable Columns Count: 0
Memtable Data Size: 0
Memtable Switch Count: 38
Read Count: 4997947
Read Latency: 0.092 ms.
Write Count: 9990394
Write Latency: 0.023 ms.
Pending Tasks: 0
Bloom Filter False Postives: 191
Bloom Filter False Ratio: 0.37525
Bloom Filter Space Used: 18741152
Compacted row 

RE: memory issue on 1.1.0

2012-06-04 Thread Poziombka, Wade L
What JVM settings do you have?
-Xms8G
-Xmx8G
-Xmn800m
-XX:+HeapDumpOnOutOfMemoryError
-XX:+UseParNewGC
-XX:+UseConcMarkSweepGC
-XX:+CMSParallelRemarkEnabled
-XX:SurvivorRatio=8
-XX:MaxTenuringThreshold=1
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly
-Djava.rmi.server.hostname=127.0.0.1
-Djava.net.preferIPv4Stack=true
-Dcassandra-pidfile=cassandra.pid


What is the machine spec ?
It is an RH AS5 x64
16gb memory
2 CPU cores 2.8 Ghz

As it turns out it is somewhat wimpier than I thought.  While weak on it does 
have a good amount of memory.
It is paired with a larger machine.


What settings do you have for key and row cache ?
A: All the defaults. (yaml template attached);

Do the CF's have secondary indexes ?
A: Yes one has two.  One of them is used in the key slice used to get the row 
keys used to do the further mutations.

How many clients / requests per second ?
A: One client process with 10 threads connected to one of the two nodes in the 
cluster.  On thread reading the slice and putting work in a queue.  9 others 
reading from this queue and applying the mutations.  Mutations are completing 
at about 20,000/minute roughly.


From: aaron morton [mailto:aa...@thelastpickle.com]
Sent: Monday, June 04, 2012 4:17 PM
To: user@cassandra.apache.org
Subject: Re: memory issue on 1.1.0

Had a look at the log, this message

INFO [ScheduledTasks:1] 2012-06-03 17:49:01,559 StorageService.java (line 2772) 
Unable to reduce heap usage since there are no dirty column families
appears correct, it happens after some flush activity and there are not CF's 
with memtable data. But the heap is still full.

Overall the server is overloaded, but it seems like it should be handling it 
better.

What JVM settings do you have? What is the machine spec ?
What settings do you have for key and row cache ?
Do the CF's have secondary indexes ?
How many clients / requests per second ?

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 4/06/2012, at 11:12 AM, Poziombka, Wade L wrote:


Running a very write intensive (new column, delete old column etc.) process and 
failing on memory.  Log file attached.

Curiously when I add new data I have never seen this have in past sent hundreds 
of millions new transactions.  It seems to be when I modify.  my process is 
as follows

key slice to get columns to modify in batches of 100, in separate threads 
modify those columns.  I advance the slice with the start key each with last 
key in previous batch.  Mutations done are update a column value in one column 
family(token), delete column and add new column in another (pan).

Runs well until after about 5 million rows then it seems to run out of memory.  
Note that these column families are quite small.

WARN [ScheduledTasks:1] 2012-06-03 17:49:01,558 GCInspector.java (line 145) 
Heap is 0.7967470834946492 full.  You may need to reduce memtable and/or cache 
sizes.  Cassandra will now flush up to the two largest memtables to free up 
memory.  Adjust flush_largest_memtables_at threshold in cassandra.yaml if you 
don't want Cassandra to do this automatically
INFO [ScheduledTasks:1] 2012-06-03 17:49:01,559 StorageService.java (line 2772) 
Unable to reduce heap usage since there are no dirty column families
INFO [GossipStage:1] 2012-06-03 17:49:01,999 Gossiper.java (line 797) 
InetAddress /10.230.34.170 is now UP
INFO [ScheduledTasks:1] 2012-06-03 17:49:10,048 GCInspector.java (line 122) GC 
for ParNew: 206 ms for 1 collections, 7345969520 used; max is 8506048512
INFO [ScheduledTasks:1] 2012-06-03 17:49:53,187 GCInspector.java (line 122) GC 
for ConcurrentMarkSweep: 12770 ms for 1 collections, 5714800208 used; max is 
8506048512


Keyspace: keyspace
   Read Count: 50042632
   Read Latency: 0.23157864418482224 ms.
   Write Count: 44948323
   Write Latency: 0.019460829472992797 ms.
   Pending Tasks: 0
   Column Family: pan
   SSTable count: 5
   Space used (live): 1977467326
   Space used (total): 1977467326
   Number of Keys (estimate): 16334848
   Memtable Columns Count: 0
   Memtable Data Size: 0
   Memtable Switch Count: 74
   Read Count: 14985122
   Read Latency: 0.408 ms.
   Write Count: 19972441
   Write Latency: 0.022 ms.
   Pending Tasks: 0
   Bloom Filter False Postives: 829
   Bloom Filter False Ratio: 0.00073
   Bloom Filter Space Used: 37048400
   Compacted row minimum size: 125
   Compacted row maximum size: 149
   Compacted row mean size: 149

   Column Family: token
   SSTable count: 4
   Space used (live): 1250973873
   Space used (total

RE: memory issue on 1.1.0

2012-06-04 Thread Poziombka, Wade L
I have repeated the test on two quite large machines 12 core, 64 GB as5 boxes 
and still observed the problem.  Interestingly about at the same point.

Anything I can monitor... perhaps I'll hook the Yourkit profiler up to it to 
see if there is some kind of leak?

Wade

From: Poziombka, Wade L
Sent: Monday, June 04, 2012 7:23 PM
To: user@cassandra.apache.org
Subject: RE: memory issue on 1.1.0

What JVM settings do you have?
-Xms8G
-Xmx8G
-Xmn800m
-XX:+HeapDumpOnOutOfMemoryError
-XX:+UseParNewGC
-XX:+UseConcMarkSweepGC
-XX:+CMSParallelRemarkEnabled
-XX:SurvivorRatio=8
-XX:MaxTenuringThreshold=1
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly
-Djava.rmi.server.hostname=127.0.0.1
-Djava.net.preferIPv4Stack=true
-Dcassandra-pidfile=cassandra.pid


What is the machine spec ?
It is an RH AS5 x64
16gb memory
2 CPU cores 2.8 Ghz

As it turns out it is somewhat wimpier than I thought.  While weak on it does 
have a good amount of memory.
It is paired with a larger machine.


What settings do you have for key and row cache ?
A: All the defaults. (yaml template attached);

Do the CF's have secondary indexes ?
A: Yes one has two.  One of them is used in the key slice used to get the row 
keys used to do the further mutations.

How many clients / requests per second ?
A: One client process with 10 threads connected to one of the two nodes in the 
cluster.  On thread reading the slice and putting work in a queue.  9 others 
reading from this queue and applying the mutations.  Mutations are completing 
at about 20,000/minute roughly.


From: aaron morton [mailto:aa...@thelastpickle.com]
Sent: Monday, June 04, 2012 4:17 PM
To: user@cassandra.apache.org
Subject: Re: memory issue on 1.1.0

Had a look at the log, this message

INFO [ScheduledTasks:1] 2012-06-03 17:49:01,559 StorageService.java (line 2772) 
Unable to reduce heap usage since there are no dirty column families
appears correct, it happens after some flush activity and there are not CF's 
with memtable data. But the heap is still full.

Overall the server is overloaded, but it seems like it should be handling it 
better.

What JVM settings do you have? What is the machine spec ?
What settings do you have for key and row cache ?
Do the CF's have secondary indexes ?
How many clients / requests per second ?

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 4/06/2012, at 11:12 AM, Poziombka, Wade L wrote:

Running a very write intensive (new column, delete old column etc.) process and 
failing on memory.  Log file attached.

Curiously when I add new data I have never seen this have in past sent hundreds 
of millions new transactions.  It seems to be when I modify.  my process is 
as follows

key slice to get columns to modify in batches of 100, in separate threads 
modify those columns.  I advance the slice with the start key each with last 
key in previous batch.  Mutations done are update a column value in one column 
family(token), delete column and add new column in another (pan).

Runs well until after about 5 million rows then it seems to run out of memory.  
Note that these column families are quite small.

WARN [ScheduledTasks:1] 2012-06-03 17:49:01,558 GCInspector.java (line 145) 
Heap is 0.7967470834946492 full.  You may need to reduce memtable and/or cache 
sizes.  Cassandra will now flush up to the two largest memtables to free up 
memory.  Adjust flush_largest_memtables_at threshold in cassandra.yaml if you 
don't want Cassandra to do this automatically
INFO [ScheduledTasks:1] 2012-06-03 17:49:01,559 StorageService.java (line 2772) 
Unable to reduce heap usage since there are no dirty column families
INFO [GossipStage:1] 2012-06-03 17:49:01,999 Gossiper.java (line 797) 
InetAddress /10.230.34.170 is now UP
INFO [ScheduledTasks:1] 2012-06-03 17:49:10,048 GCInspector.java (line 122) GC 
for ParNew: 206 ms for 1 collections, 7345969520 used; max is 8506048512
INFO [ScheduledTasks:1] 2012-06-03 17:49:53,187 GCInspector.java (line 122) GC 
for ConcurrentMarkSweep: 12770 ms for 1 collections, 5714800208 used; max is 
8506048512


Keyspace: keyspace
   Read Count: 50042632
   Read Latency: 0.23157864418482224 ms.
   Write Count: 44948323
   Write Latency: 0.019460829472992797 ms.
   Pending Tasks: 0
   Column Family: pan
   SSTable count: 5
   Space used (live): 1977467326
   Space used (total): 1977467326
   Number of Keys (estimate): 16334848
   Memtable Columns Count: 0
   Memtable Data Size: 0
   Memtable Switch Count: 74
   Read Count: 14985122
   Read Latency: 0.408 ms.
   Write Count: 19972441
   Write Latency: 0.022 ms.
   Pending Tasks: 0
   Bloom Filter False

Re: memory issue on 1.1.0

2012-06-04 Thread Brandon Williams
Perhaps the deletes: https://issues.apache.org/jira/browse/CASSANDRA-3741

-Brandon

On Sun, Jun 3, 2012 at 6:12 PM, Poziombka, Wade L
wade.l.poziom...@intel.com wrote:
 Running a very write intensive (new column, delete old column etc.) process 
 and failing on memory.  Log file attached.

 Curiously when I add new data I have never seen this have in past sent 
 hundreds of millions new transactions.  It seems to be when I modify.  my 
 process is as follows

 key slice to get columns to modify in batches of 100, in separate threads 
 modify those columns.  I advance the slice with the start key each with last 
 key in previous batch.  Mutations done are update a column value in one 
 column family(token), delete column and add new column in another (pan).

 Runs well until after about 5 million rows then it seems to run out of 
 memory.  Note that these column families are quite small.

 WARN [ScheduledTasks:1] 2012-06-03 17:49:01,558 GCInspector.java (line 145) 
 Heap is 0.7967470834946492 full.  You may need to reduce memtable and/or 
 cache sizes.  Cassandra will now flush up to the two largest memtables to 
 free up memory.  Adjust flush_largest_memtables_at threshold in 
 cassandra.yaml if you don't want Cassandra to do this automatically
  INFO [ScheduledTasks:1] 2012-06-03 17:49:01,559 StorageService.java (line 
 2772) Unable to reduce heap usage since there are no dirty column families
  INFO [GossipStage:1] 2012-06-03 17:49:01,999 Gossiper.java (line 797) 
 InetAddress /10.230.34.170 is now UP
  INFO [ScheduledTasks:1] 2012-06-03 17:49:10,048 GCInspector.java (line 122) 
 GC for ParNew: 206 ms for 1 collections, 7345969520 used; max is 8506048512
  INFO [ScheduledTasks:1] 2012-06-03 17:49:53,187 GCInspector.java (line 122) 
 GC for ConcurrentMarkSweep: 12770 ms for 1 collections, 5714800208 used; max 
 is 8506048512

 
 Keyspace: keyspace
        Read Count: 50042632
        Read Latency: 0.23157864418482224 ms.
        Write Count: 44948323
        Write Latency: 0.019460829472992797 ms.
        Pending Tasks: 0
                Column Family: pan
                SSTable count: 5
                Space used (live): 1977467326
                Space used (total): 1977467326
                Number of Keys (estimate): 16334848
                Memtable Columns Count: 0
                Memtable Data Size: 0
                Memtable Switch Count: 74
                Read Count: 14985122
                Read Latency: 0.408 ms.
                Write Count: 19972441
                Write Latency: 0.022 ms.
                Pending Tasks: 0
                Bloom Filter False Postives: 829
                Bloom Filter False Ratio: 0.00073
                Bloom Filter Space Used: 37048400
                Compacted row minimum size: 125
                Compacted row maximum size: 149
                Compacted row mean size: 149

                Column Family: token
                SSTable count: 4
                Space used (live): 1250973873
                Space used (total): 1250973873
                Number of Keys (estimate): 14217216
                Memtable Columns Count: 0
                Memtable Data Size: 0
                Memtable Switch Count: 49
                Read Count: 30059563
                Read Latency: 0.167 ms.
                Write Count: 14985488
                Write Latency: 0.014 ms.
                Pending Tasks: 0
                Bloom Filter False Postives: 13642
                Bloom Filter False Ratio: 0.00322
                Bloom Filter Space Used: 28002984
                Compacted row minimum size: 150
                Compacted row maximum size: 258
                Compacted row mean size: 224

                Column Family: counters
                SSTable count: 2
                Space used (live): 561549994
                Space used (total): 561549994
                Number of Keys (estimate): 9985024
                Memtable Columns Count: 0
                Memtable Data Size: 0
                Memtable Switch Count: 38
                Read Count: 4997947
                Read Latency: 0.092 ms.
                Write Count: 9990394
                Write Latency: 0.023 ms.
                Pending Tasks: 0
                Bloom Filter False Postives: 191
                Bloom Filter False Ratio: 0.37525
                Bloom Filter Space Used: 18741152
                Compacted row minimum size: 125
                Compacted row maximum size: 179
                Compacted row mean size: 150