Re: Is it a memory issue?
Yes, it does mean you’re getting ahead of Cassandra’s ability to keep up although I would have probably expected a higher number of pending compactions before you got serious issues (I’ve seen numbers in the thousands). I notice from the screenshot you provide that you are using secondary indexes. There are a lot of way to missuse secondary indexes (vs not very many way to use them well). I think it’s possible that what you are seeing is the result of the secondary index on event time (I assume a very high cardinality column). This is a good blog on secondary indexes: http://www.wentnet.com/blog/?p=77 Cheers Ben On Mon, 7 Nov 2016 at 16:29 wxn...@zjqunshuo.com <wxn...@zjqunshuo.com> wrote: > Thanks Ben. I stopped inserting and checked compaction status as you > mentioned. Seems there is lots of compaction work waiting to do. Please see > below. In this case is it a sign that writting faster than C* can process? > > One node, > [root@iZbp11zpafrqfsiys90kzoZ bin]# ./nodetool compactionstats > pending tasks: 195 > > id compaction type keyspace > tablecompleted totalunit progress > >5da60b10-a4a9-11e6-88e9-755b5673a02aCompaction cargts > eventdata.eventdata_event_time_idx 1699866872 26536427792 bytes > 6.41% > >Compaction system > hints 103543795172210360 bytes 0.20% > Active compaction remaining time : 0h29m48s > > Another node, > [root@iZbp1iqnrpsdhoodwii32bZ bin]# ./nodetool compactionstats > pending tasks: 84 > > id compaction type keyspace > table completed totalunit progress > >28a9d010-a4a7-11e6-b985-979fea8d6099Compaction cargts > eventdata 6561414001424412420 bytes 46.06% > >7c034840-a48e-11e6-b985-979fea8d6099Compaction cargts > eventdata.eventdata_event_time_idx 32098562606 42616107664 bytes > 75.32% > Active compaction remaining time : 0h11m12s > > > *From:* Ben Slater <ben.sla...@instaclustr.com> > *Date:* 2016-11-07 11:41 > *To:* user <user@cassandra.apache.org> > *Subject:* Re: Is it a memory issue? > > This sounds to me like your writes go ahead of compactions trying to keep > up which can eventually cause issues. Keep an eye on nodetool > compactionstats if the number of compactions continually climbs then you > are writing faster than Cassandra can actually process. If this is > happening then you need to either add more processing capacity (nodes) to > your cluster or throttle writes on the client side. > > It could also be related to conditions like an individual partition > growing too big but I’d check for backed up compactions first. > > Cheers > Ben > > On Mon, 7 Nov 2016 at 14:17 wxn...@zjqunshuo.com <wxn...@zjqunshuo.com> > wrote: > > Hi All, > We have one issue on C* testing. At first the inserting was very fast and > TPS was about 30K/s, but when the size of data rows reached 2 billion, the > insertion rate decreased very badly and the TPS was 20K/s. When the size of > rows reached 2.3 billion, the TPS decreased to 0.5K/s, and writing timeout > come out. At last OOM issue happened in some nodes and C* deamon in some > nodes crashed. In production we have about 8 billion rows. My testing > cluster setting is as below. My question is if the memory is the main > issue. Do I need increase the memory, and what's the right setting for > MAX_HEAP_SIZE > and HEAP_NEWSIZE? > > My cluster setting: > C* cluster with 3 nodes in Aliyun Cloud > CPU: 4core > Memory: 8G > Disk: 500G > MAX_HEAP_SIZE=2G > HEAP_NEWSIZE=500M > > My table schema: > > CREATE KEYSPACE IF NOT EXISTS cargts WITH REPLICATION = {'class': > 'SimpleStrategy','replication_factor':2}; > use cargts; > CREATE TABLE eventdata ( > deviceId int, > date int, > event_time bigint, > lat decimal, > lon decimal, > speed int, > heading int, > PRIMARY KEY ((deviceId,date),event_time) > ) > WITH CLUSTERING ORDER BY (event_time ASC); > CREATE INDEX ON eventdata (event_time); > > Best Regards, > -Simon Wu > >
Re: Is it a memory issue?
Thanks Ben. I stopped inserting and checked compaction status as you mentioned. Seems there is lots of compaction work waiting to do. Please see below. In this case is it a sign that writting faster than C* can process? One node, [root@iZbp11zpafrqfsiys90kzoZ bin]# ./nodetool compactionstats pending tasks: 195 id compaction type keyspace tablecompleted totalunit progress 5da60b10-a4a9-11e6-88e9-755b5673a02aCompaction cargts eventdata.eventdata_event_time_idx 1699866872 26536427792 bytes 6.41% Compaction system hints 103543795172210360 bytes 0.20% Active compaction remaining time : 0h29m48s Another node, [root@iZbp1iqnrpsdhoodwii32bZ bin]# ./nodetool compactionstats pending tasks: 84 id compaction type keyspace table completed totalunit progress 28a9d010-a4a7-11e6-b985-979fea8d6099Compaction cargts eventdata 6561414001424412420 bytes 46.06% 7c034840-a48e-11e6-b985-979fea8d6099Compaction cargts eventdata.eventdata_event_time_idx 32098562606 42616107664 bytes 75.32% Active compaction remaining time : 0h11m12s From: Ben Slater Date: 2016-11-07 11:41 To: user Subject: Re: Is it a memory issue? This sounds to me like your writes go ahead of compactions trying to keep up which can eventually cause issues. Keep an eye on nodetool compactionstats if the number of compactions continually climbs then you are writing faster than Cassandra can actually process. If this is happening then you need to either add more processing capacity (nodes) to your cluster or throttle writes on the client side. It could also be related to conditions like an individual partition growing too big but I’d check for backed up compactions first. Cheers Ben On Mon, 7 Nov 2016 at 14:17 wxn...@zjqunshuo.com <wxn...@zjqunshuo.com> wrote: Hi All, We have one issue on C* testing. At first the inserting was very fast and TPS was about 30K/s, but when the size of data rows reached 2 billion, the insertion rate decreased very badly and the TPS was 20K/s. When the size of rows reached 2.3 billion, the TPS decreased to 0.5K/s, and writing timeout come out. At last OOM issue happened in some nodes and C* deamon in some nodes crashed. In production we have about 8 billion rows. My testing cluster setting is as below. My question is if the memory is the main issue. Do I need increase the memory, and what's the right setting for MAX_HEAP_SIZE and HEAP_NEWSIZE? My cluster setting: C* cluster with 3 nodes in Aliyun Cloud CPU: 4core Memory: 8G Disk: 500G MAX_HEAP_SIZE=2G HEAP_NEWSIZE=500M My table schema: CREATE KEYSPACE IF NOT EXISTS cargts WITH REPLICATION = {'class': 'SimpleStrategy','replication_factor':2}; use cargts; CREATE TABLE eventdata ( deviceId int, date int, event_time bigint, lat decimal, lon decimal, speed int, heading int, PRIMARY KEY ((deviceId,date),event_time) ) WITH CLUSTERING ORDER BY (event_time ASC); CREATE INDEX ON eventdata (event_time); Best Regards, -Simon Wu
Re: Is it a memory issue?
This sounds to me like your writes go ahead of compactions trying to keep up which can eventually cause issues. Keep an eye on nodetool compactionstats if the number of compactions continually climbs then you are writing faster than Cassandra can actually process. If this is happening then you need to either add more processing capacity (nodes) to your cluster or throttle writes on the client side. It could also be related to conditions like an individual partition growing too big but I’d check for backed up compactions first. Cheers Ben On Mon, 7 Nov 2016 at 14:17 wxn...@zjqunshuo.comwrote: > Hi All, > We have one issue on C* testing. At first the inserting was very fast and > TPS was about 30K/s, but when the size of data rows reached 2 billion, the > insertion rate decreased very badly and the TPS was 20K/s. When the size of > rows reached 2.3 billion, the TPS decreased to 0.5K/s, and writing timeout > come out. At last OOM issue happened in some nodes and C* deamon in some > nodes crashed. In production we have about 8 billion rows. My testing > cluster setting is as below. My question is if the memory is the main > issue. Do I need increase the memory, and what's the right setting for > MAX_HEAP_SIZE > and HEAP_NEWSIZE? > > My cluster setting: > C* cluster with 3 nodes in Aliyun Cloud > CPU: 4core > Memory: 8G > Disk: 500G > MAX_HEAP_SIZE=2G > HEAP_NEWSIZE=500M > > My table schema: > > CREATE KEYSPACE IF NOT EXISTS cargts WITH REPLICATION = {'class': > 'SimpleStrategy','replication_factor':2}; > use cargts; > CREATE TABLE eventdata ( > deviceId int, > date int, > event_time bigint, > lat decimal, > lon decimal, > speed int, > heading int, > PRIMARY KEY ((deviceId,date),event_time) > ) > WITH CLUSTERING ORDER BY (event_time ASC); > CREATE INDEX ON eventdata (event_time); > > Best Regards, > -Simon Wu > >
Is it a memory issue?
Hi All, We have one issue on C* testing. At first the inserting was very fast and TPS was about 30K/s, but when the size of data rows reached 2 billion, the insertion rate decreased very badly and the TPS was 20K/s. When the size of rows reached 2.3 billion, the TPS decreased to 0.5K/s, and writing timeout come out. At last OOM issue happened in some nodes and C* deamon in some nodes crashed. In production we have about 8 billion rows. My testing cluster setting is as below. My question is if the memory is the main issue. Do I need increase the memory, and what's the right setting for MAX_HEAP_SIZE and HEAP_NEWSIZE? My cluster setting: C* cluster with 3 nodes in Aliyun Cloud CPU: 4core Memory: 8G Disk: 500G MAX_HEAP_SIZE=2G HEAP_NEWSIZE=500M My table schema: CREATE KEYSPACE IF NOT EXISTS cargts WITH REPLICATION = {'class': 'SimpleStrategy','replication_factor':2}; use cargts; CREATE TABLE eventdata ( deviceId int, date int, event_time bigint, lat decimal, lon decimal, speed int, heading int, PRIMARY KEY ((deviceId,date),event_time) ) WITH CLUSTERING ORDER BY (event_time ASC); CREATE INDEX ON eventdata (event_time); Best Regards, -Simon Wu
Re: Memory issue
As soon as it starts, the JVM is get killed because of memory issue. What is the memory issue that gets kills the JVM ? The log message below is simply a warning WARN [main] 2011-06-15 09:58:56,861 CLibrary.java (line 118) Unable to lock JVM memory (ENOMEM). This can result in part of the JVM being swapped out, especially with mmapped I/O enabled. Increase RLIMIT_MEMLOCK or run Cassandra as root. Is there anything in the system logs ? Cheers Aaron - Aaron Morton New Zealand @aaronmorton Co-Founder Principal Consultant Apache Cassandra Consulting http://www.thelastpickle.com On 24/05/2014, at 9:17 am, Robert Coli rc...@eventbrite.com wrote: On Fri, May 23, 2014 at 2:08 PM, opensaf dev opensaf...@gmail.com wrote: I have a different service which controls the cassandra service for high availability. IMO, starting or stopping a Cassandra node should never be a side effect of another system's properties. YMMV. https://issues.apache.org/jira/browse/CASSANDRA-2356 For some related comments. =Rob
Re: Memory issue
On Wed, May 21, 2014 at 12:59 AM, opensaf dev opensaf...@gmail.com wrote: When I run as user cassandra, it starts and runs fine. Why do you want to run Cassandra as a different user? -- Patricia Gorla @patriciagorla Consultant Apache Cassandra Consulting http://www.thelastpickle.com http://thelastpickle.com
Re: Memory issue
I have a different service which controls the cassandra service for high availability. Thanks Dev On Fri, May 23, 2014 at 7:35 AM, Patricia Gorla patri...@thelastpickle.comwrote: On Wed, May 21, 2014 at 12:59 AM, opensaf dev opensaf...@gmail.comwrote: When I run as user cassandra, it starts and runs fine. Why do you want to run Cassandra as a different user? -- Patricia Gorla @patriciagorla Consultant Apache Cassandra Consulting http://www.thelastpickle.com http://thelastpickle.com
Re: Memory issue
On Fri, May 23, 2014 at 2:08 PM, opensaf dev opensaf...@gmail.com wrote: I have a different service which controls the cassandra service for high availability. IMO, starting or stopping a Cassandra node should never be a side effect of another system's properties. YMMV. https://issues.apache.org/jira/browse/CASSANDRA-2356 For some related comments. =Rob
Re: Memory issue
Well Romain, I had tried restarting the VM as well but problem still remained. What I noticed is after sometime irrespective I run cassandra from other user or using the normal cassandra the problem still remains. As soon as it starts, the JVM is get killed because of memory issue. Is there some other settings other then limits.conf file I need to configure. Also to note that I dont have a /etc/limits.d/cassandra.conf file. I just configured them in limits.conf. Even though I made same the group ID of both the users(cassanda, X) but no use. Is there anything like cassandra has to be started under cassandra user only? What are the special configurations required if we try to run cassandra under a different user? Thanks Dev On Tue, May 20, 2014 at 10:44 PM, Romain HARDOUIN romain.hardo...@urssaf.fr wrote: Well... you have already changed the limits ;-) Keep in mind that changes in the limits.conf file will not affect processes that are already running. opensaf dev opensaf...@gmail.com a écrit sur 21/05/2014 06:59:05 : De : opensaf dev opensaf...@gmail.com A : user@cassandra.apache.org, Date : 21/05/2014 07:00 Objet : Memory issue Hi guys, I am trying to run Cassandra on CentOS as an user X other then root or cassandra. When I run as user cassandra, it starts and runs fine. But, when I run under user X, I am getting the below error once cassandra started and system freezes totally. Insufficient memlock settings: WARN [main] 2011-06-15 09:58:56,861 CLibrary.java (line 118) Unable to lock JVM memory (ENOMEM). This can result in part of the JVM being swapped out, especially with mmapped I/O enabled. Increase RLIMIT_MEMLOCK or run Cassandra as root. I have tried the tips available online to change the memlock and other limits both for users cassadra and X, but did not solve the problem. What else I should consider when I run cassandra other then user cassandra/root. Any help is much appreciated. Thanks Dev
Memory issue
Hi guys, I am trying to run Cassandra on CentOS as an user X other then root or cassandra. When I run as user cassandra, it starts and runs fine. But, when I run under user X, I am getting the below error once cassandra started and system freezes totally. *Insufficient memlock settings:* WARN [main] 2011-06-15 09:58:56,861 CLibrary.java (line 118) Unable to lock JVM memory (ENOMEM). This can result in part of the JVM being swapped out, especially with mmapped I/O enabled. Increase RLIMIT_MEMLOCK or run Cassandra as root. I have tried the tips available online to change the memlock and other limits both for users cassadra and X, but did not solve the problem. What else I should consider when I run cassandra other then user cassandra/root. Any help is much appreciated. Thanks Dev
RE: Memory issue
Hi, You have to define limits for the user. Here is an example for the user cassandra: # cat /etc/security/limits.d/cassandra.conf cassandra - memlock unlimited cassandra - nofile 10 best, Romain opensaf dev opensaf...@gmail.com a écrit sur 21/05/2014 06:59:05 : De : opensaf dev opensaf...@gmail.com A : user@cassandra.apache.org, Date : 21/05/2014 07:00 Objet : Memory issue Hi guys, I am trying to run Cassandra on CentOS as an user X other then root or cassandra. When I run as user cassandra, it starts and runs fine. But, when I run under user X, I am getting the below error once cassandra started and system freezes totally. Insufficient memlock settings: WARN [main] 2011-06-15 09:58:56,861 CLibrary.java (line 118) Unable to lock JVM memory (ENOMEM). This can result in part of the JVM being swapped out, especially with mmapped I/O enabled. Increase RLIMIT_MEMLOCK or run Cassandra as root. I have tried the tips available online to change the memlock and other limits both for users cassadra and X, but did not solve the problem. What else I should consider when I run cassandra other then user cassandra/root. Any help is much appreciated. Thanks Dev
RE: Memory issue
Well... you have already changed the limits ;-) Keep in mind that changes in the limits.conf file will not affect processes that are already running. opensaf dev opensaf...@gmail.com a écrit sur 21/05/2014 06:59:05 : De : opensaf dev opensaf...@gmail.com A : user@cassandra.apache.org, Date : 21/05/2014 07:00 Objet : Memory issue Hi guys, I am trying to run Cassandra on CentOS as an user X other then root or cassandra. When I run as user cassandra, it starts and runs fine. But, when I run under user X, I am getting the below error once cassandra started and system freezes totally. Insufficient memlock settings: WARN [main] 2011-06-15 09:58:56,861 CLibrary.java (line 118) Unable to lock JVM memory (ENOMEM). This can result in part of the JVM being swapped out, especially with mmapped I/O enabled. Increase RLIMIT_MEMLOCK or run Cassandra as root. I have tried the tips available online to change the memlock and other limits both for users cassadra and X, but did not solve the problem. What else I should consider when I run cassandra other then user cassandra/root. Any help is much appreciated. Thanks Dev
Re: memory issue on 1.1.0
Mina, That does not sound right. If you have the time can you create a jira ticket describing the problem, please include: * the GC logs gathered by enabling them here https://github.com/apache/cassandra/blob/trunk/conf/cassandra-env.sh#L165 (It would be good to see the node get into trouble if possible). * OS, JVM and cassandra versions * information on the schema and workload * anything else you think is important. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 6/06/2012, at 7:24 AM, Mina Naguib wrote: Hi Wade I don't know if your scenario matches mine, but I've been struggling with memory pressure in 1.x as well. I made the jump from 0.7.9 to 1.1.0, along with enabling compression and levelled compactions, so I don't know which specifically is the main culprit. Specifically, all my nodes seem to lose heap memory. As parnew and CMS do their job, over any reasonable period of time, the floor of memory after a GC keeps rising. This is quite visible if you leave jconsole connected for a day or so, and manifests itself as a funny-looking cone like so: http://mina.naguib.ca/images/cassandra_jconsole.png Once memory pressure reaches a point where the heap can't be maintained reliably below 75%, cassandra goes into survival mode - via a bunch of tunables in cassandra.conf it'll do things like flush memtables, drop caches, etc - all of which, in my experience, especially with the recent off-heap data structures, exasperate the problem. I've been meaning, of course, to collect enough technical data to file a bug report, but haven't had the time. I have not yet tested 1.1.1 to see if it improves the situation. What I have found however, is a band-aid which you see at the rightmost section of the graph in the screenshot I posted. That is simply to hit Perform GC button in jconsole. It seems that a full System.gc() *DOES* reclaim heap memory that parnew and CMS fail to reclaim. On my production cluster I have a full-GC via JMX scheduled in a rolling fashion every 4 hours. It's extremely expensive (20-40 seconds of unresponsiveness) but is a necessary evil in my situation. Without it, my nodes enter a nasty spiral of constant flushing, constant compactions, high heap usage, instability and high latency. On 2012-06-05, at 2:56 PM, Poziombka, Wade L wrote: Alas, upgrading to 1.1.1 did not solve my issue. -Original Message- From: Brandon Williams [mailto:dri...@gmail.com] Sent: Monday, June 04, 2012 11:24 PM To: user@cassandra.apache.org Subject: Re: memory issue on 1.1.0 Perhaps the deletes: https://issues.apache.org/jira/browse/CASSANDRA-3741 -Brandon On Sun, Jun 3, 2012 at 6:12 PM, Poziombka, Wade L wade.l.poziom...@intel.com wrote: Running a very write intensive (new column, delete old column etc.) process and failing on memory. Log file attached. Curiously when I add new data I have never seen this have in past sent hundreds of millions new transactions. It seems to be when I modify. my process is as follows key slice to get columns to modify in batches of 100, in separate threads modify those columns. I advance the slice with the start key each with last key in previous batch. Mutations done are update a column value in one column family(token), delete column and add new column in another (pan). Runs well until after about 5 million rows then it seems to run out of memory. Note that these column families are quite small. WARN [ScheduledTasks:1] 2012-06-03 17:49:01,558 GCInspector.java (line 145) Heap is 0.7967470834946492 full. You may need to reduce memtable and/or cache sizes. Cassandra will now flush up to the two largest memtables to free up memory. Adjust flush_largest_memtables_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically INFO [ScheduledTasks:1] 2012-06-03 17:49:01,559 StorageService.java (line 2772) Unable to reduce heap usage since there are no dirty column families INFO [GossipStage:1] 2012-06-03 17:49:01,999 Gossiper.java (line 797) InetAddress /10.230.34.170 is now UP INFO [ScheduledTasks:1] 2012-06-03 17:49:10,048 GCInspector.java (line 122) GC for ParNew: 206 ms for 1 collections, 7345969520 used; max is 8506048512 INFO [ScheduledTasks:1] 2012-06-03 17:49:53,187 GCInspector.java (line 122) GC for ConcurrentMarkSweep: 12770 ms for 1 collections, 5714800208 used; max is 8506048512 Keyspace: keyspace Read Count: 50042632 Read Latency: 0.23157864418482224 ms. Write Count: 44948323 Write Latency: 0.019460829472992797 ms. Pending Tasks: 0 Column Family: pan SSTable count: 5 Space used (live): 1977467326 Space used (total): 1977467326 Number of Keys (estimate): 16334848 Memtable
Re: memory issue on 1.1.0
I looked through the log again. Still looks like it's overloaded and not handling the overload very well. It looks like a sustained write load of around 280K columns every 5 minutes for about 5 hours. It may be that the CPU is the bottle neck when it comes to GC throughput. You are hitting ParNew issues from the very start, and end up with 20 second CMS. Do you see high CPU load ? Can you enable the GC logging options in cassandra-env.sh ? Can you throttle back the test and to a level where the server does not fail ? Alternatively can you dump the heap when it get's full and see what it taking up all the space ? Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 6/06/2012, at 2:12 PM, Poziombka, Wade L wrote: Ok, so I have completely refactored to remove deletes and it still fails. So it is completely unrelated to deletes. I guess I need to go back to 1.0.10? When I originally evaluated I ran 1.0.8... perhaps I went a bridge too far with 1.1. I don't think I am doing anything exotic here. Here is my column family. KsDef(name:TB_UNIT, strategy_class:org.apache.cassandra.locator.SimpleStrategy, strategy_options:{replication_factor=3}, cf_defs:[ CfDef(keyspace:TB_UNIT, name:token, column_type:Standard, comparator_type:BytesType, column_metadata:[ColumnDef(name:70 61 6E 45 6E 63, validation_class:BytesType), ColumnDef(name:63 72 65 61 74 65 54 73, validation_class:DateType), ColumnDef(name:63 72 65 61 74 65 44 61 74 65, validation_class:DateType, index_type:KEYS, index_name:TokenCreateDate), ColumnDef(name:65 6E 63 72 79 70 74 69 6F 6E 53 65 74 74 69 6E 67 73 49 44, validation_class:UTF8Type, index_type:KEYS, index_name:EncryptionSettingsID)], caching:keys_only), CfDef(keyspace:TB_UNIT, name:pan_d721fd40fd9443aa81cc6f59c8e047c6, column_type:Standard, comparator_type:BytesType, caching:keys_only), CfDef(keyspace:TB_UNIT, name:counters, column_type:Standard, comparator_type:BytesType, column_metadata:[ColumnDef(name:75 73 65 43 6F 75 6E 74, validation_class:CounterColumnType)], default_validation_class:CounterColumnType, caching:keys_only) ]) -Original Message- From: Poziombka, Wade L [mailto:wade.l.poziom...@intel.com] Sent: Tuesday, June 05, 2012 3:09 PM To: user@cassandra.apache.org Subject: RE: memory issue on 1.1.0 Thank you. I do have some of the same observations. Do you do deletes? My observation is that without deletes (or column updates I guess) I can run forever happy. but when I run (what for me is a batch process) operations that delete and modify column values I run into this. Reading bug https://issues.apache.org/jira/browse/CASSANDRA-3741 the advice is to NOT do deletes individually and to truncate. I am scrambling to try to do this but curious if it will be worth the effort. Wade -Original Message- From: Mina Naguib [mailto:mina.nag...@bloomdigital.com] Sent: Tuesday, June 05, 2012 2:24 PM To: user@cassandra.apache.org Subject: Re: memory issue on 1.1.0 Hi Wade I don't know if your scenario matches mine, but I've been struggling with memory pressure in 1.x as well. I made the jump from 0.7.9 to 1.1.0, along with enabling compression and levelled compactions, so I don't know which specifically is the main culprit. Specifically, all my nodes seem to lose heap memory. As parnew and CMS do their job, over any reasonable period of time, the floor of memory after a GC keeps rising. This is quite visible if you leave jconsole connected for a day or so, and manifests itself as a funny-looking cone like so: http://mina.naguib.ca/images/cassandra_jconsole.png Once memory pressure reaches a point where the heap can't be maintained reliably below 75%, cassandra goes into survival mode - via a bunch of tunables in cassandra.conf it'll do things like flush memtables, drop caches, etc - all of which, in my experience, especially with the recent off-heap data structures, exasperate the problem. I've been meaning, of course, to collect enough technical data to file a bug report, but haven't had the time. I have not yet tested 1.1.1 to see if it improves the situation. What I have found however, is a band-aid which you see at the rightmost section of the graph in the screenshot I posted. That is simply to hit Perform GC button in jconsole. It seems that a full System.gc() *DOES* reclaim heap memory that parnew and CMS fail to reclaim. On my production cluster I have a full-GC via JMX scheduled in a rolling fashion every 4 hours. It's extremely expensive (20-40 seconds of unresponsiveness) but is a necessary evil in my situation. Without it, my nodes enter a nasty spiral of constant flushing, constant compactions, high heap usage, instability and high latency. On 2012-06-05, at 2:56 PM, Poziombka, Wade L wrote: Alas, upgrading to 1.1.1 did not solve
Re: memory issue on 1.1.0
Just to check, do you have JNA setup correctly? (You should see a couple of log messages about it shortly after startup.) Truncate also performs a snapshot by default. On Wed, Jun 6, 2012 at 12:38 PM, Poziombka, Wade L wade.l.poziom...@intel.com wrote: ** However, after all the work I issued a truncate on the old column family (the one replaced by this process) and I get an out of memory condition then. -- Tyler Hobbs DataStax http://datastax.com/
RE: memory issue on 1.1.0
I believe so. There are no warnings on startup. So is there a preferred way to completely eliminate a column family? From: Tyler Hobbs [mailto:ty...@datastax.com] Sent: Wednesday, June 06, 2012 1:17 PM To: user@cassandra.apache.org Subject: Re: memory issue on 1.1.0 Just to check, do you have JNA setup correctly? (You should see a couple of log messages about it shortly after startup.) Truncate also performs a snapshot by default. On Wed, Jun 6, 2012 at 12:38 PM, Poziombka, Wade L wade.l.poziom...@intel.commailto:wade.l.poziom...@intel.com wrote: However, after all the work I issued a truncate on the old column family (the one replaced by this process) and I get an out of memory condition then. -- Tyler Hobbs DataStaxhttp://datastax.com/
Re: memory issue on 1.1.0
use drop. truncate is mostly for unit tests. A - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 7/06/2012, at 6:22 AM, Poziombka, Wade L wrote: I believe so. There are no warnings on startup. So is there a preferred way to completely eliminate a column family? From: Tyler Hobbs [mailto:ty...@datastax.com] Sent: Wednesday, June 06, 2012 1:17 PM To: user@cassandra.apache.org Subject: Re: memory issue on 1.1.0 Just to check, do you have JNA setup correctly? (You should see a couple of log messages about it shortly after startup.) Truncate also performs a snapshot by default. On Wed, Jun 6, 2012 at 12:38 PM, Poziombka, Wade L wade.l.poziom...@intel.com wrote: However, after all the work I issued a truncate on the old column family (the one replaced by this process) and I get an out of memory condition then. -- Tyler Hobbs DataStax
RE: memory issue on 1.1.0
Alas, upgrading to 1.1.1 did not solve my issue. -Original Message- From: Brandon Williams [mailto:dri...@gmail.com] Sent: Monday, June 04, 2012 11:24 PM To: user@cassandra.apache.org Subject: Re: memory issue on 1.1.0 Perhaps the deletes: https://issues.apache.org/jira/browse/CASSANDRA-3741 -Brandon On Sun, Jun 3, 2012 at 6:12 PM, Poziombka, Wade L wade.l.poziom...@intel.com wrote: Running a very write intensive (new column, delete old column etc.) process and failing on memory. Log file attached. Curiously when I add new data I have never seen this have in past sent hundreds of millions new transactions. It seems to be when I modify. my process is as follows key slice to get columns to modify in batches of 100, in separate threads modify those columns. I advance the slice with the start key each with last key in previous batch. Mutations done are update a column value in one column family(token), delete column and add new column in another (pan). Runs well until after about 5 million rows then it seems to run out of memory. Note that these column families are quite small. WARN [ScheduledTasks:1] 2012-06-03 17:49:01,558 GCInspector.java (line 145) Heap is 0.7967470834946492 full. You may need to reduce memtable and/or cache sizes. Cassandra will now flush up to the two largest memtables to free up memory. Adjust flush_largest_memtables_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically INFO [ScheduledTasks:1] 2012-06-03 17:49:01,559 StorageService.java (line 2772) Unable to reduce heap usage since there are no dirty column families INFO [GossipStage:1] 2012-06-03 17:49:01,999 Gossiper.java (line 797) InetAddress /10.230.34.170 is now UP INFO [ScheduledTasks:1] 2012-06-03 17:49:10,048 GCInspector.java (line 122) GC for ParNew: 206 ms for 1 collections, 7345969520 used; max is 8506048512 INFO [ScheduledTasks:1] 2012-06-03 17:49:53,187 GCInspector.java (line 122) GC for ConcurrentMarkSweep: 12770 ms for 1 collections, 5714800208 used; max is 8506048512 Keyspace: keyspace Read Count: 50042632 Read Latency: 0.23157864418482224 ms. Write Count: 44948323 Write Latency: 0.019460829472992797 ms. Pending Tasks: 0 Column Family: pan SSTable count: 5 Space used (live): 1977467326 Space used (total): 1977467326 Number of Keys (estimate): 16334848 Memtable Columns Count: 0 Memtable Data Size: 0 Memtable Switch Count: 74 Read Count: 14985122 Read Latency: 0.408 ms. Write Count: 19972441 Write Latency: 0.022 ms. Pending Tasks: 0 Bloom Filter False Postives: 829 Bloom Filter False Ratio: 0.00073 Bloom Filter Space Used: 37048400 Compacted row minimum size: 125 Compacted row maximum size: 149 Compacted row mean size: 149 Column Family: token SSTable count: 4 Space used (live): 1250973873 Space used (total): 1250973873 Number of Keys (estimate): 14217216 Memtable Columns Count: 0 Memtable Data Size: 0 Memtable Switch Count: 49 Read Count: 30059563 Read Latency: 0.167 ms. Write Count: 14985488 Write Latency: 0.014 ms. Pending Tasks: 0 Bloom Filter False Postives: 13642 Bloom Filter False Ratio: 0.00322 Bloom Filter Space Used: 28002984 Compacted row minimum size: 150 Compacted row maximum size: 258 Compacted row mean size: 224 Column Family: counters SSTable count: 2 Space used (live): 561549994 Space used (total): 561549994 Number of Keys (estimate): 9985024 Memtable Columns Count: 0 Memtable Data Size: 0 Memtable Switch Count: 38 Read Count: 4997947 Read Latency: 0.092 ms. Write Count: 9990394 Write Latency: 0.023 ms. Pending Tasks: 0 Bloom Filter False Postives: 191 Bloom Filter False Ratio: 0.37525 Bloom Filter Space Used: 18741152 Compacted row minimum size: 125 Compacted row maximum size: 179 Compacted row mean size: 150
Re: memory issue on 1.1.0
Hi Wade I don't know if your scenario matches mine, but I've been struggling with memory pressure in 1.x as well. I made the jump from 0.7.9 to 1.1.0, along with enabling compression and levelled compactions, so I don't know which specifically is the main culprit. Specifically, all my nodes seem to lose heap memory. As parnew and CMS do their job, over any reasonable period of time, the floor of memory after a GC keeps rising. This is quite visible if you leave jconsole connected for a day or so, and manifests itself as a funny-looking cone like so: http://mina.naguib.ca/images/cassandra_jconsole.png Once memory pressure reaches a point where the heap can't be maintained reliably below 75%, cassandra goes into survival mode - via a bunch of tunables in cassandra.conf it'll do things like flush memtables, drop caches, etc - all of which, in my experience, especially with the recent off-heap data structures, exasperate the problem. I've been meaning, of course, to collect enough technical data to file a bug report, but haven't had the time. I have not yet tested 1.1.1 to see if it improves the situation. What I have found however, is a band-aid which you see at the rightmost section of the graph in the screenshot I posted. That is simply to hit Perform GC button in jconsole. It seems that a full System.gc() *DOES* reclaim heap memory that parnew and CMS fail to reclaim. On my production cluster I have a full-GC via JMX scheduled in a rolling fashion every 4 hours. It's extremely expensive (20-40 seconds of unresponsiveness) but is a necessary evil in my situation. Without it, my nodes enter a nasty spiral of constant flushing, constant compactions, high heap usage, instability and high latency. On 2012-06-05, at 2:56 PM, Poziombka, Wade L wrote: Alas, upgrading to 1.1.1 did not solve my issue. -Original Message- From: Brandon Williams [mailto:dri...@gmail.com] Sent: Monday, June 04, 2012 11:24 PM To: user@cassandra.apache.org Subject: Re: memory issue on 1.1.0 Perhaps the deletes: https://issues.apache.org/jira/browse/CASSANDRA-3741 -Brandon On Sun, Jun 3, 2012 at 6:12 PM, Poziombka, Wade L wade.l.poziom...@intel.com wrote: Running a very write intensive (new column, delete old column etc.) process and failing on memory. Log file attached. Curiously when I add new data I have never seen this have in past sent hundreds of millions new transactions. It seems to be when I modify. my process is as follows key slice to get columns to modify in batches of 100, in separate threads modify those columns. I advance the slice with the start key each with last key in previous batch. Mutations done are update a column value in one column family(token), delete column and add new column in another (pan). Runs well until after about 5 million rows then it seems to run out of memory. Note that these column families are quite small. WARN [ScheduledTasks:1] 2012-06-03 17:49:01,558 GCInspector.java (line 145) Heap is 0.7967470834946492 full. You may need to reduce memtable and/or cache sizes. Cassandra will now flush up to the two largest memtables to free up memory. Adjust flush_largest_memtables_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically INFO [ScheduledTasks:1] 2012-06-03 17:49:01,559 StorageService.java (line 2772) Unable to reduce heap usage since there are no dirty column families INFO [GossipStage:1] 2012-06-03 17:49:01,999 Gossiper.java (line 797) InetAddress /10.230.34.170 is now UP INFO [ScheduledTasks:1] 2012-06-03 17:49:10,048 GCInspector.java (line 122) GC for ParNew: 206 ms for 1 collections, 7345969520 used; max is 8506048512 INFO [ScheduledTasks:1] 2012-06-03 17:49:53,187 GCInspector.java (line 122) GC for ConcurrentMarkSweep: 12770 ms for 1 collections, 5714800208 used; max is 8506048512 Keyspace: keyspace Read Count: 50042632 Read Latency: 0.23157864418482224 ms. Write Count: 44948323 Write Latency: 0.019460829472992797 ms. Pending Tasks: 0 Column Family: pan SSTable count: 5 Space used (live): 1977467326 Space used (total): 1977467326 Number of Keys (estimate): 16334848 Memtable Columns Count: 0 Memtable Data Size: 0 Memtable Switch Count: 74 Read Count: 14985122 Read Latency: 0.408 ms. Write Count: 19972441 Write Latency: 0.022 ms. Pending Tasks: 0 Bloom Filter False Postives: 829 Bloom Filter False Ratio: 0.00073 Bloom Filter Space Used: 37048400 Compacted row minimum size: 125 Compacted row maximum size: 149 Compacted row mean size: 149 Column Family: token
RE: memory issue on 1.1.0
Thank you. I do have some of the same observations. Do you do deletes? My observation is that without deletes (or column updates I guess) I can run forever happy. but when I run (what for me is a batch process) operations that delete and modify column values I run into this. Reading bug https://issues.apache.org/jira/browse/CASSANDRA-3741 the advice is to NOT do deletes individually and to truncate. I am scrambling to try to do this but curious if it will be worth the effort. Wade -Original Message- From: Mina Naguib [mailto:mina.nag...@bloomdigital.com] Sent: Tuesday, June 05, 2012 2:24 PM To: user@cassandra.apache.org Subject: Re: memory issue on 1.1.0 Hi Wade I don't know if your scenario matches mine, but I've been struggling with memory pressure in 1.x as well. I made the jump from 0.7.9 to 1.1.0, along with enabling compression and levelled compactions, so I don't know which specifically is the main culprit. Specifically, all my nodes seem to lose heap memory. As parnew and CMS do their job, over any reasonable period of time, the floor of memory after a GC keeps rising. This is quite visible if you leave jconsole connected for a day or so, and manifests itself as a funny-looking cone like so: http://mina.naguib.ca/images/cassandra_jconsole.png Once memory pressure reaches a point where the heap can't be maintained reliably below 75%, cassandra goes into survival mode - via a bunch of tunables in cassandra.conf it'll do things like flush memtables, drop caches, etc - all of which, in my experience, especially with the recent off-heap data structures, exasperate the problem. I've been meaning, of course, to collect enough technical data to file a bug report, but haven't had the time. I have not yet tested 1.1.1 to see if it improves the situation. What I have found however, is a band-aid which you see at the rightmost section of the graph in the screenshot I posted. That is simply to hit Perform GC button in jconsole. It seems that a full System.gc() *DOES* reclaim heap memory that parnew and CMS fail to reclaim. On my production cluster I have a full-GC via JMX scheduled in a rolling fashion every 4 hours. It's extremely expensive (20-40 seconds of unresponsiveness) but is a necessary evil in my situation. Without it, my nodes enter a nasty spiral of constant flushing, constant compactions, high heap usage, instability and high latency. On 2012-06-05, at 2:56 PM, Poziombka, Wade L wrote: Alas, upgrading to 1.1.1 did not solve my issue. -Original Message- From: Brandon Williams [mailto:dri...@gmail.com] Sent: Monday, June 04, 2012 11:24 PM To: user@cassandra.apache.org Subject: Re: memory issue on 1.1.0 Perhaps the deletes: https://issues.apache.org/jira/browse/CASSANDRA-3741 -Brandon On Sun, Jun 3, 2012 at 6:12 PM, Poziombka, Wade L wade.l.poziom...@intel.com wrote: Running a very write intensive (new column, delete old column etc.) process and failing on memory. Log file attached. Curiously when I add new data I have never seen this have in past sent hundreds of millions new transactions. It seems to be when I modify. my process is as follows key slice to get columns to modify in batches of 100, in separate threads modify those columns. I advance the slice with the start key each with last key in previous batch. Mutations done are update a column value in one column family(token), delete column and add new column in another (pan). Runs well until after about 5 million rows then it seems to run out of memory. Note that these column families are quite small. WARN [ScheduledTasks:1] 2012-06-03 17:49:01,558 GCInspector.java (line 145) Heap is 0.7967470834946492 full. You may need to reduce memtable and/or cache sizes. Cassandra will now flush up to the two largest memtables to free up memory. Adjust flush_largest_memtables_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically INFO [ScheduledTasks:1] 2012-06-03 17:49:01,559 StorageService.java (line 2772) Unable to reduce heap usage since there are no dirty column families INFO [GossipStage:1] 2012-06-03 17:49:01,999 Gossiper.java (line 797) InetAddress /10.230.34.170 is now UP INFO [ScheduledTasks:1] 2012-06-03 17:49:10,048 GCInspector.java (line 122) GC for ParNew: 206 ms for 1 collections, 7345969520 used; max is 8506048512 INFO [ScheduledTasks:1] 2012-06-03 17:49:53,187 GCInspector.java (line 122) GC for ConcurrentMarkSweep: 12770 ms for 1 collections, 5714800208 used; max is 8506048512 Keyspace: keyspace Read Count: 50042632 Read Latency: 0.23157864418482224 ms. Write Count: 44948323 Write Latency: 0.019460829472992797 ms. Pending Tasks: 0 Column Family: pan SSTable count: 5 Space used (live): 1977467326 Space used (total): 1977467326
RE: memory issue on 1.1.0
Ok, so I have completely refactored to remove deletes and it still fails. So it is completely unrelated to deletes. I guess I need to go back to 1.0.10? When I originally evaluated I ran 1.0.8... perhaps I went a bridge too far with 1.1. I don't think I am doing anything exotic here. Here is my column family. KsDef(name:TB_UNIT, strategy_class:org.apache.cassandra.locator.SimpleStrategy, strategy_options:{replication_factor=3}, cf_defs:[ CfDef(keyspace:TB_UNIT, name:token, column_type:Standard, comparator_type:BytesType, column_metadata:[ColumnDef(name:70 61 6E 45 6E 63, validation_class:BytesType), ColumnDef(name:63 72 65 61 74 65 54 73, validation_class:DateType), ColumnDef(name:63 72 65 61 74 65 44 61 74 65, validation_class:DateType, index_type:KEYS, index_name:TokenCreateDate), ColumnDef(name:65 6E 63 72 79 70 74 69 6F 6E 53 65 74 74 69 6E 67 73 49 44, validation_class:UTF8Type, index_type:KEYS, index_name:EncryptionSettingsID)], caching:keys_only), CfDef(keyspace:TB_UNIT, name:pan_d721fd40fd9443aa81cc6f59c8e047c6, column_type:Standard, comparator_type:BytesType, caching:keys_only), CfDef(keyspace:TB_UNIT, name:counters, column_type:Standard, comparator_type:BytesType, column_metadata:[ColumnDef(name:75 73 65 43 6F 75 6E 74, validation_class:CounterColumnType)], default_validation_class:CounterColumnType, caching:keys_only) ]) -Original Message- From: Poziombka, Wade L [mailto:wade.l.poziom...@intel.com] Sent: Tuesday, June 05, 2012 3:09 PM To: user@cassandra.apache.org Subject: RE: memory issue on 1.1.0 Thank you. I do have some of the same observations. Do you do deletes? My observation is that without deletes (or column updates I guess) I can run forever happy. but when I run (what for me is a batch process) operations that delete and modify column values I run into this. Reading bug https://issues.apache.org/jira/browse/CASSANDRA-3741 the advice is to NOT do deletes individually and to truncate. I am scrambling to try to do this but curious if it will be worth the effort. Wade -Original Message- From: Mina Naguib [mailto:mina.nag...@bloomdigital.com] Sent: Tuesday, June 05, 2012 2:24 PM To: user@cassandra.apache.org Subject: Re: memory issue on 1.1.0 Hi Wade I don't know if your scenario matches mine, but I've been struggling with memory pressure in 1.x as well. I made the jump from 0.7.9 to 1.1.0, along with enabling compression and levelled compactions, so I don't know which specifically is the main culprit. Specifically, all my nodes seem to lose heap memory. As parnew and CMS do their job, over any reasonable period of time, the floor of memory after a GC keeps rising. This is quite visible if you leave jconsole connected for a day or so, and manifests itself as a funny-looking cone like so: http://mina.naguib.ca/images/cassandra_jconsole.png Once memory pressure reaches a point where the heap can't be maintained reliably below 75%, cassandra goes into survival mode - via a bunch of tunables in cassandra.conf it'll do things like flush memtables, drop caches, etc - all of which, in my experience, especially with the recent off-heap data structures, exasperate the problem. I've been meaning, of course, to collect enough technical data to file a bug report, but haven't had the time. I have not yet tested 1.1.1 to see if it improves the situation. What I have found however, is a band-aid which you see at the rightmost section of the graph in the screenshot I posted. That is simply to hit Perform GC button in jconsole. It seems that a full System.gc() *DOES* reclaim heap memory that parnew and CMS fail to reclaim. On my production cluster I have a full-GC via JMX scheduled in a rolling fashion every 4 hours. It's extremely expensive (20-40 seconds of unresponsiveness) but is a necessary evil in my situation. Without it, my nodes enter a nasty spiral of constant flushing, constant compactions, high heap usage, instability and high latency. On 2012-06-05, at 2:56 PM, Poziombka, Wade L wrote: Alas, upgrading to 1.1.1 did not solve my issue. -Original Message- From: Brandon Williams [mailto:dri...@gmail.com] Sent: Monday, June 04, 2012 11:24 PM To: user@cassandra.apache.org Subject: Re: memory issue on 1.1.0 Perhaps the deletes: https://issues.apache.org/jira/browse/CASSANDRA-3741 -Brandon On Sun, Jun 3, 2012 at 6:12 PM, Poziombka, Wade L wade.l.poziom...@intel.com wrote: Running a very write intensive (new column, delete old column etc.) process and failing on memory. Log file attached. Curiously when I add new data I have never seen this have in past sent hundreds of millions new transactions. It seems to be when I modify. my process is as follows key slice to get columns to modify in batches of 100, in separate threads modify those columns. I advance the slice with the start key each with last key in previous batch. Mutations done are update
Re: memory issue on 1.1.0
Had a look at the log, this message INFO [ScheduledTasks:1] 2012-06-03 17:49:01,559 StorageService.java (line 2772) Unable to reduce heap usage since there are no dirty column families appears correct, it happens after some flush activity and there are not CF's with memtable data. But the heap is still full. Overall the server is overloaded, but it seems like it should be handling it better. What JVM settings do you have? What is the machine spec ? What settings do you have for key and row cache ? Do the CF's have secondary indexes ? How many clients / requests per second ? Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 4/06/2012, at 11:12 AM, Poziombka, Wade L wrote: Running a very write intensive (new column, delete old column etc.) process and failing on memory. Log file attached. Curiously when I add new data I have never seen this have in past sent hundreds of millions new transactions. It seems to be when I modify. my process is as follows key slice to get columns to modify in batches of 100, in separate threads modify those columns. I advance the slice with the start key each with last key in previous batch. Mutations done are update a column value in one column family(token), delete column and add new column in another (pan). Runs well until after about 5 million rows then it seems to run out of memory. Note that these column families are quite small. WARN [ScheduledTasks:1] 2012-06-03 17:49:01,558 GCInspector.java (line 145) Heap is 0.7967470834946492 full. You may need to reduce memtable and/or cache sizes. Cassandra will now flush up to the two largest memtables to free up memory. Adjust flush_largest_memtables_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically INFO [ScheduledTasks:1] 2012-06-03 17:49:01,559 StorageService.java (line 2772) Unable to reduce heap usage since there are no dirty column families INFO [GossipStage:1] 2012-06-03 17:49:01,999 Gossiper.java (line 797) InetAddress /10.230.34.170 is now UP INFO [ScheduledTasks:1] 2012-06-03 17:49:10,048 GCInspector.java (line 122) GC for ParNew: 206 ms for 1 collections, 7345969520 used; max is 8506048512 INFO [ScheduledTasks:1] 2012-06-03 17:49:53,187 GCInspector.java (line 122) GC for ConcurrentMarkSweep: 12770 ms for 1 collections, 5714800208 used; max is 8506048512 Keyspace: keyspace Read Count: 50042632 Read Latency: 0.23157864418482224 ms. Write Count: 44948323 Write Latency: 0.019460829472992797 ms. Pending Tasks: 0 Column Family: pan SSTable count: 5 Space used (live): 1977467326 Space used (total): 1977467326 Number of Keys (estimate): 16334848 Memtable Columns Count: 0 Memtable Data Size: 0 Memtable Switch Count: 74 Read Count: 14985122 Read Latency: 0.408 ms. Write Count: 19972441 Write Latency: 0.022 ms. Pending Tasks: 0 Bloom Filter False Postives: 829 Bloom Filter False Ratio: 0.00073 Bloom Filter Space Used: 37048400 Compacted row minimum size: 125 Compacted row maximum size: 149 Compacted row mean size: 149 Column Family: token SSTable count: 4 Space used (live): 1250973873 Space used (total): 1250973873 Number of Keys (estimate): 14217216 Memtable Columns Count: 0 Memtable Data Size: 0 Memtable Switch Count: 49 Read Count: 30059563 Read Latency: 0.167 ms. Write Count: 14985488 Write Latency: 0.014 ms. Pending Tasks: 0 Bloom Filter False Postives: 13642 Bloom Filter False Ratio: 0.00322 Bloom Filter Space Used: 28002984 Compacted row minimum size: 150 Compacted row maximum size: 258 Compacted row mean size: 224 Column Family: counters SSTable count: 2 Space used (live): 561549994 Space used (total): 561549994 Number of Keys (estimate): 9985024 Memtable Columns Count: 0 Memtable Data Size: 0 Memtable Switch Count: 38 Read Count: 4997947 Read Latency: 0.092 ms. Write Count: 9990394 Write Latency: 0.023 ms. Pending Tasks: 0 Bloom Filter False Postives: 191 Bloom Filter False Ratio: 0.37525 Bloom Filter Space Used: 18741152 Compacted row
RE: memory issue on 1.1.0
What JVM settings do you have? -Xms8G -Xmx8G -Xmn800m -XX:+HeapDumpOnOutOfMemoryError -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -Djava.rmi.server.hostname=127.0.0.1 -Djava.net.preferIPv4Stack=true -Dcassandra-pidfile=cassandra.pid What is the machine spec ? It is an RH AS5 x64 16gb memory 2 CPU cores 2.8 Ghz As it turns out it is somewhat wimpier than I thought. While weak on it does have a good amount of memory. It is paired with a larger machine. What settings do you have for key and row cache ? A: All the defaults. (yaml template attached); Do the CF's have secondary indexes ? A: Yes one has two. One of them is used in the key slice used to get the row keys used to do the further mutations. How many clients / requests per second ? A: One client process with 10 threads connected to one of the two nodes in the cluster. On thread reading the slice and putting work in a queue. 9 others reading from this queue and applying the mutations. Mutations are completing at about 20,000/minute roughly. From: aaron morton [mailto:aa...@thelastpickle.com] Sent: Monday, June 04, 2012 4:17 PM To: user@cassandra.apache.org Subject: Re: memory issue on 1.1.0 Had a look at the log, this message INFO [ScheduledTasks:1] 2012-06-03 17:49:01,559 StorageService.java (line 2772) Unable to reduce heap usage since there are no dirty column families appears correct, it happens after some flush activity and there are not CF's with memtable data. But the heap is still full. Overall the server is overloaded, but it seems like it should be handling it better. What JVM settings do you have? What is the machine spec ? What settings do you have for key and row cache ? Do the CF's have secondary indexes ? How many clients / requests per second ? Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 4/06/2012, at 11:12 AM, Poziombka, Wade L wrote: Running a very write intensive (new column, delete old column etc.) process and failing on memory. Log file attached. Curiously when I add new data I have never seen this have in past sent hundreds of millions new transactions. It seems to be when I modify. my process is as follows key slice to get columns to modify in batches of 100, in separate threads modify those columns. I advance the slice with the start key each with last key in previous batch. Mutations done are update a column value in one column family(token), delete column and add new column in another (pan). Runs well until after about 5 million rows then it seems to run out of memory. Note that these column families are quite small. WARN [ScheduledTasks:1] 2012-06-03 17:49:01,558 GCInspector.java (line 145) Heap is 0.7967470834946492 full. You may need to reduce memtable and/or cache sizes. Cassandra will now flush up to the two largest memtables to free up memory. Adjust flush_largest_memtables_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically INFO [ScheduledTasks:1] 2012-06-03 17:49:01,559 StorageService.java (line 2772) Unable to reduce heap usage since there are no dirty column families INFO [GossipStage:1] 2012-06-03 17:49:01,999 Gossiper.java (line 797) InetAddress /10.230.34.170 is now UP INFO [ScheduledTasks:1] 2012-06-03 17:49:10,048 GCInspector.java (line 122) GC for ParNew: 206 ms for 1 collections, 7345969520 used; max is 8506048512 INFO [ScheduledTasks:1] 2012-06-03 17:49:53,187 GCInspector.java (line 122) GC for ConcurrentMarkSweep: 12770 ms for 1 collections, 5714800208 used; max is 8506048512 Keyspace: keyspace Read Count: 50042632 Read Latency: 0.23157864418482224 ms. Write Count: 44948323 Write Latency: 0.019460829472992797 ms. Pending Tasks: 0 Column Family: pan SSTable count: 5 Space used (live): 1977467326 Space used (total): 1977467326 Number of Keys (estimate): 16334848 Memtable Columns Count: 0 Memtable Data Size: 0 Memtable Switch Count: 74 Read Count: 14985122 Read Latency: 0.408 ms. Write Count: 19972441 Write Latency: 0.022 ms. Pending Tasks: 0 Bloom Filter False Postives: 829 Bloom Filter False Ratio: 0.00073 Bloom Filter Space Used: 37048400 Compacted row minimum size: 125 Compacted row maximum size: 149 Compacted row mean size: 149 Column Family: token SSTable count: 4 Space used (live): 1250973873 Space used (total
RE: memory issue on 1.1.0
I have repeated the test on two quite large machines 12 core, 64 GB as5 boxes and still observed the problem. Interestingly about at the same point. Anything I can monitor... perhaps I'll hook the Yourkit profiler up to it to see if there is some kind of leak? Wade From: Poziombka, Wade L Sent: Monday, June 04, 2012 7:23 PM To: user@cassandra.apache.org Subject: RE: memory issue on 1.1.0 What JVM settings do you have? -Xms8G -Xmx8G -Xmn800m -XX:+HeapDumpOnOutOfMemoryError -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -Djava.rmi.server.hostname=127.0.0.1 -Djava.net.preferIPv4Stack=true -Dcassandra-pidfile=cassandra.pid What is the machine spec ? It is an RH AS5 x64 16gb memory 2 CPU cores 2.8 Ghz As it turns out it is somewhat wimpier than I thought. While weak on it does have a good amount of memory. It is paired with a larger machine. What settings do you have for key and row cache ? A: All the defaults. (yaml template attached); Do the CF's have secondary indexes ? A: Yes one has two. One of them is used in the key slice used to get the row keys used to do the further mutations. How many clients / requests per second ? A: One client process with 10 threads connected to one of the two nodes in the cluster. On thread reading the slice and putting work in a queue. 9 others reading from this queue and applying the mutations. Mutations are completing at about 20,000/minute roughly. From: aaron morton [mailto:aa...@thelastpickle.com] Sent: Monday, June 04, 2012 4:17 PM To: user@cassandra.apache.org Subject: Re: memory issue on 1.1.0 Had a look at the log, this message INFO [ScheduledTasks:1] 2012-06-03 17:49:01,559 StorageService.java (line 2772) Unable to reduce heap usage since there are no dirty column families appears correct, it happens after some flush activity and there are not CF's with memtable data. But the heap is still full. Overall the server is overloaded, but it seems like it should be handling it better. What JVM settings do you have? What is the machine spec ? What settings do you have for key and row cache ? Do the CF's have secondary indexes ? How many clients / requests per second ? Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 4/06/2012, at 11:12 AM, Poziombka, Wade L wrote: Running a very write intensive (new column, delete old column etc.) process and failing on memory. Log file attached. Curiously when I add new data I have never seen this have in past sent hundreds of millions new transactions. It seems to be when I modify. my process is as follows key slice to get columns to modify in batches of 100, in separate threads modify those columns. I advance the slice with the start key each with last key in previous batch. Mutations done are update a column value in one column family(token), delete column and add new column in another (pan). Runs well until after about 5 million rows then it seems to run out of memory. Note that these column families are quite small. WARN [ScheduledTasks:1] 2012-06-03 17:49:01,558 GCInspector.java (line 145) Heap is 0.7967470834946492 full. You may need to reduce memtable and/or cache sizes. Cassandra will now flush up to the two largest memtables to free up memory. Adjust flush_largest_memtables_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically INFO [ScheduledTasks:1] 2012-06-03 17:49:01,559 StorageService.java (line 2772) Unable to reduce heap usage since there are no dirty column families INFO [GossipStage:1] 2012-06-03 17:49:01,999 Gossiper.java (line 797) InetAddress /10.230.34.170 is now UP INFO [ScheduledTasks:1] 2012-06-03 17:49:10,048 GCInspector.java (line 122) GC for ParNew: 206 ms for 1 collections, 7345969520 used; max is 8506048512 INFO [ScheduledTasks:1] 2012-06-03 17:49:53,187 GCInspector.java (line 122) GC for ConcurrentMarkSweep: 12770 ms for 1 collections, 5714800208 used; max is 8506048512 Keyspace: keyspace Read Count: 50042632 Read Latency: 0.23157864418482224 ms. Write Count: 44948323 Write Latency: 0.019460829472992797 ms. Pending Tasks: 0 Column Family: pan SSTable count: 5 Space used (live): 1977467326 Space used (total): 1977467326 Number of Keys (estimate): 16334848 Memtable Columns Count: 0 Memtable Data Size: 0 Memtable Switch Count: 74 Read Count: 14985122 Read Latency: 0.408 ms. Write Count: 19972441 Write Latency: 0.022 ms. Pending Tasks: 0 Bloom Filter False
Re: memory issue on 1.1.0
Perhaps the deletes: https://issues.apache.org/jira/browse/CASSANDRA-3741 -Brandon On Sun, Jun 3, 2012 at 6:12 PM, Poziombka, Wade L wade.l.poziom...@intel.com wrote: Running a very write intensive (new column, delete old column etc.) process and failing on memory. Log file attached. Curiously when I add new data I have never seen this have in past sent hundreds of millions new transactions. It seems to be when I modify. my process is as follows key slice to get columns to modify in batches of 100, in separate threads modify those columns. I advance the slice with the start key each with last key in previous batch. Mutations done are update a column value in one column family(token), delete column and add new column in another (pan). Runs well until after about 5 million rows then it seems to run out of memory. Note that these column families are quite small. WARN [ScheduledTasks:1] 2012-06-03 17:49:01,558 GCInspector.java (line 145) Heap is 0.7967470834946492 full. You may need to reduce memtable and/or cache sizes. Cassandra will now flush up to the two largest memtables to free up memory. Adjust flush_largest_memtables_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically INFO [ScheduledTasks:1] 2012-06-03 17:49:01,559 StorageService.java (line 2772) Unable to reduce heap usage since there are no dirty column families INFO [GossipStage:1] 2012-06-03 17:49:01,999 Gossiper.java (line 797) InetAddress /10.230.34.170 is now UP INFO [ScheduledTasks:1] 2012-06-03 17:49:10,048 GCInspector.java (line 122) GC for ParNew: 206 ms for 1 collections, 7345969520 used; max is 8506048512 INFO [ScheduledTasks:1] 2012-06-03 17:49:53,187 GCInspector.java (line 122) GC for ConcurrentMarkSweep: 12770 ms for 1 collections, 5714800208 used; max is 8506048512 Keyspace: keyspace Read Count: 50042632 Read Latency: 0.23157864418482224 ms. Write Count: 44948323 Write Latency: 0.019460829472992797 ms. Pending Tasks: 0 Column Family: pan SSTable count: 5 Space used (live): 1977467326 Space used (total): 1977467326 Number of Keys (estimate): 16334848 Memtable Columns Count: 0 Memtable Data Size: 0 Memtable Switch Count: 74 Read Count: 14985122 Read Latency: 0.408 ms. Write Count: 19972441 Write Latency: 0.022 ms. Pending Tasks: 0 Bloom Filter False Postives: 829 Bloom Filter False Ratio: 0.00073 Bloom Filter Space Used: 37048400 Compacted row minimum size: 125 Compacted row maximum size: 149 Compacted row mean size: 149 Column Family: token SSTable count: 4 Space used (live): 1250973873 Space used (total): 1250973873 Number of Keys (estimate): 14217216 Memtable Columns Count: 0 Memtable Data Size: 0 Memtable Switch Count: 49 Read Count: 30059563 Read Latency: 0.167 ms. Write Count: 14985488 Write Latency: 0.014 ms. Pending Tasks: 0 Bloom Filter False Postives: 13642 Bloom Filter False Ratio: 0.00322 Bloom Filter Space Used: 28002984 Compacted row minimum size: 150 Compacted row maximum size: 258 Compacted row mean size: 224 Column Family: counters SSTable count: 2 Space used (live): 561549994 Space used (total): 561549994 Number of Keys (estimate): 9985024 Memtable Columns Count: 0 Memtable Data Size: 0 Memtable Switch Count: 38 Read Count: 4997947 Read Latency: 0.092 ms. Write Count: 9990394 Write Latency: 0.023 ms. Pending Tasks: 0 Bloom Filter False Postives: 191 Bloom Filter False Ratio: 0.37525 Bloom Filter Space Used: 18741152 Compacted row minimum size: 125 Compacted row maximum size: 179 Compacted row mean size: 150