Upgrade old cluster
Hi, I'd like to update a small cluster running 2.0.17 (on virtual private servers with ssh access) to a new version of Cassandra and in the process, I'd like to make it easier to update/add nodes/run maintenance in the future. What tools are common to use to automate such tasks? RIght now everything is done manually. Suggestions for the best way to do the update? Best regards, Joel
Re: Simple upgrade for outdated cluster
Thank you for your replies! We're at 2.0.17. Den fre 3 aug. 2018 kl 14:34 skrev Romain Hardouin : > Also, you didn't mention which C*2.0 version you're using but prior to > upgrade to 2.1.20, make sure to use the latest 2.0 - or at least >= 2.0.7 > > Le vendredi 3 août 2018 à 13:03:39 UTC+2, Romain Hardouin > a écrit : > > > Hi Joel, > > No it's not supported. C*2.0 can't stream data to C*3.11. > > Make the upgrade 2.0 -> 2.1.20 then you'll be able to upgrade to 3.11.3 > i.e. 2.1.20 -> 3.11.3. You can upgrade to 3.0.17 as an intermediary step (I > would do), but don't upgrade to 2.2. Also make sure to read carefully > https://github.com/apache/cassandra/blob/cassandra-3.11/NEWS.txt It's a > long read but it's important. There are lots of changes between all these > versions. > > Best, > > Romain > Le vendredi 3 août 2018 à 11:40:26 UTC+2, Joel Samuelsson < > samuelsson.j...@gmail.com> a écrit : > > > Hi, > > We have a pretty outdated Cassandra cluster running version 2.0.x. Instead > of doing step by step upgrades (2.0 -> 2.1, 2.1 -> 2.2, 2.2 -> 3.0, 3.0 -> > 3.11.x), would it be possible to add new nodes with a recent version (say > 3.11.x) and start decommissioning the old ones until we have a cluster with > only 3.11.x? > > Best regards, > Joel >
Simple upgrade for outdated cluster
Hi, We have a pretty outdated Cassandra cluster running version 2.0.x. Instead of doing step by step upgrades (2.0 -> 2.1, 2.1 -> 2.2, 2.2 -> 3.0, 3.0 -> 3.11.x), would it be possible to add new nodes with a recent version (say 3.11.x) and start decommissioning the old ones until we have a cluster with only 3.11.x? Best regards, Joel
Re: Alter composite column
Yeah, I want column4 to appear in each cell name (rather than just once) which I think would be the same as altering the primary key. 2018-01-18 12:18 GMT+01:00 Nicolas Guyomar <nicolas.guyo...@gmail.com>: > Well it should be as easy as following this : https://docs.datastax.com/ > en/cql/3.1/cql/cql_using/use_alter_add.html > > But I'm worried that your initial requirement was to change the clustering > key, as Alexander stated, you need to create a new table and transfer your > data in it > > On 18 January 2018 at 12:03, Joel Samuelsson <samuelsson.j...@gmail.com> > wrote: > >> It was indeed created with C* 1.X >> Do you have any links or otherwise on how I would add the column4? I >> don't want to risk destroying my data. >> >> Best regards, >> Joel >> >> 2018-01-18 11:18 GMT+01:00 Nicolas Guyomar <nicolas.guyo...@gmail.com>: >> >>> Hi Joel, >>> >>> You cannot alter a table primary key. >>> >>> You can however alter your existing table to only add column4 using >>> cqlsh and cql, even if this table as created back with C* 1.X for instance >>> >>> On 18 January 2018 at 11:14, Joel Samuelsson <samuelsson.j...@gmail.com> >>> wrote: >>> >>>> So to rephrase that in CQL terms I have a table like this: >>>> >>>> CREATE TABLE events ( >>>> key text, >>>> column1 int, >>>> column2 int, >>>> column3 text, >>>> value text, >>>> PRIMARY KEY(key, column1, column2, column3) >>>> ) WITH COMPACT STORAGE >>>> >>>> and I'd like to change it to: >>>> CREATE TABLE events ( >>>> key text, >>>> column1 int, >>>> column2 int, >>>> column3 text, >>>> column4 text, >>>> value text, >>>> PRIMARY KEY(key, column1, column2, column3, column4) >>>> ) WITH COMPACT STORAGE >>>> >>>> Is this possible? >>>> Best regards, >>>> Joel >>>> >>>> 2018-01-12 16:53 GMT+01:00 Joel Samuelsson <samuelsson.j...@gmail.com>: >>>> >>>>> Hi, >>>>> >>>>> I have an older system (C* 2.1) using Thrift tables on which I want to >>>>> alter a column composite. Right now it looks like (int, int, string) but I >>>>> want it to be (int, int, string, string). Is it possible to do this on a >>>>> live cluster without deleting the old data? Can you point me to some >>>>> documentation about this? I can't seem to find it any more. >>>>> >>>>> Best regards, >>>>> Joel >>>>> >>>> >>>> >>> >> >
Re: Alter composite column
It was indeed created with C* 1.X Do you have any links or otherwise on how I would add the column4? I don't want to risk destroying my data. Best regards, Joel 2018-01-18 11:18 GMT+01:00 Nicolas Guyomar <nicolas.guyo...@gmail.com>: > Hi Joel, > > You cannot alter a table primary key. > > You can however alter your existing table to only add column4 using cqlsh > and cql, even if this table as created back with C* 1.X for instance > > On 18 January 2018 at 11:14, Joel Samuelsson <samuelsson.j...@gmail.com> > wrote: > >> So to rephrase that in CQL terms I have a table like this: >> >> CREATE TABLE events ( >> key text, >> column1 int, >> column2 int, >> column3 text, >> value text, >> PRIMARY KEY(key, column1, column2, column3) >> ) WITH COMPACT STORAGE >> >> and I'd like to change it to: >> CREATE TABLE events ( >> key text, >> column1 int, >> column2 int, >> column3 text, >> column4 text, >> value text, >> PRIMARY KEY(key, column1, column2, column3, column4) >> ) WITH COMPACT STORAGE >> >> Is this possible? >> Best regards, >> Joel >> >> 2018-01-12 16:53 GMT+01:00 Joel Samuelsson <samuelsson.j...@gmail.com>: >> >>> Hi, >>> >>> I have an older system (C* 2.1) using Thrift tables on which I want to >>> alter a column composite. Right now it looks like (int, int, string) but I >>> want it to be (int, int, string, string). Is it possible to do this on a >>> live cluster without deleting the old data? Can you point me to some >>> documentation about this? I can't seem to find it any more. >>> >>> Best regards, >>> Joel >>> >> >> >
Re: Alter composite column
So to rephrase that in CQL terms I have a table like this: CREATE TABLE events ( key text, column1 int, column2 int, column3 text, value text, PRIMARY KEY(key, column1, column2, column3) ) WITH COMPACT STORAGE and I'd like to change it to: CREATE TABLE events ( key text, column1 int, column2 int, column3 text, column4 text, value text, PRIMARY KEY(key, column1, column2, column3, column4) ) WITH COMPACT STORAGE Is this possible? Best regards, Joel 2018-01-12 16:53 GMT+01:00 Joel Samuelsson <samuelsson.j...@gmail.com>: > Hi, > > I have an older system (C* 2.1) using Thrift tables on which I want to > alter a column composite. Right now it looks like (int, int, string) but I > want it to be (int, int, string, string). Is it possible to do this on a > live cluster without deleting the old data? Can you point me to some > documentation about this? I can't seem to find it any more. > > Best regards, > Joel >
Alter composite column
Hi, I have an older system (C* 2.1) using Thrift tables on which I want to alter a column composite. Right now it looks like (int, int, string) but I want it to be (int, int, string, string). Is it possible to do this on a live cluster without deleting the old data? Can you point me to some documentation about this? I can't seem to find it any more. Best regards, Joel
Re: Safe to run cleanup before repair?
Great, thanks for your replies. 2017-11-12 21:44 GMT+01:00 Jeff Jirsa <jji...@gmail.com>: > That is: bootstrap will maintain whatever consistency guarantees you had > when you started. > > -- > Jeff Jirsa > > > On Nov 12, 2017, at 12:41 PM, kurt greaves <k...@instaclustr.com> wrote: > > By default, bootstrap will stream from the primary replica of the range it > is taking ownership of. So Node 3 would have to stream from Node 2 if it > was taking ownership of Node 2's tokens. > On 13 Nov. 2017 05:00, "Joel Samuelsson" <samuelsson.j...@gmail.com> > wrote: > >> Yeah, sounds right. What I'm worried about is the following: >> I used to have only 2 nodes with RF 2 so both nodes had a copy of all >> data. There were incosistencies since I was unable to run repair, so some >> parts of the data may only exist on one node. I have now added two nodes, >> thus changing which nodes own what parts of the data. My concern is if a >> piece of data is now owned by say Node 1 and Node 3 but before the addition >> of new nodes only existed on Node 2 and a cleanup would then delete it >> permanently since Node 2 no longer owns it. Could this ever happen? >> >
Re: Safe to run cleanup before repair?
Yeah, sounds right. What I'm worried about is the following: I used to have only 2 nodes with RF 2 so both nodes had a copy of all data. There were incosistencies since I was unable to run repair, so some parts of the data may only exist on one node. I have now added two nodes, thus changing which nodes own what parts of the data. My concern is if a piece of data is now owned by say Node 1 and Node 3 but before the addition of new nodes only existed on Node 2 and a cleanup would then delete it permanently since Node 2 no longer owns it. Could this ever happen?
Safe to run cleanup before repair?
So, I have a cluster which grew too large data-wise so that compactions no longer worked (because of full disk). I have now added new nodes so that data is spread more thin. However, I know there are incosistencies in the cluster and I need to run a repair but those also fail because of out of disk errors. Is it safe to run cleanup before I run the repair or might I lose data because of said incosistencies?
How to know if bootstrap is still running
I'm trying to add a new node to a small existing cluster. During the bootstrap one of the nodes went down. I'm not sure at what point in the process the node went down, all files may have been sent before that happened. Currently: nodetool netstats says that all files are received 100% nodetool status says that the new node is still joining How can I know if bootstrap has hung?
Re: Nodes go down periodically
"Is it only one node at a time that goes down, and at widely dispersed times?" It is a two node cluster so both nodes consider the other node down at the same time. These are the times the latest few days: INFO [GossipTasks:1] 2016-02-19 05:06:21,087 Gossiper.java (line 992) InetAddress /x.x.x.x is now DOWN INFO [GossipTasks:1] 2016-02-19 14:33:38,424 Gossiper.java (line 992) InetAddress /x.x.x.x is now DOWN INFO [GossipTasks:1] 2016-02-20 07:21:25,626 Gossiper.java (line 992) InetAddress /x.x.x.x is now DOWN INFO [GossipTasks:1] 2016-02-20 11:34:46,766 Gossiper.java (line 992) InetAddress /x.x.x.x is now DOWN INFO [GossipTasks:1] 2016-02-21 08:00:07,518 Gossiper.java (line 992) InetAddress /x.x.x.x is now DOWN INFO [GossipTasks:1] 2016-02-21 10:36:58,788 Gossiper.java (line 992) InetAddress /x.x.x.x is now DOWN INFO [GossipTasks:1] 2016-02-22 07:10:40,304 Gossiper.java (line 992) InetAddress /x.x.x.x is now DOWN INFO [GossipTasks:1] 2016-02-22 10:05:14,896 Gossiper.java (line 992) InetAddress /x.x.x.x is now DOWN INFO [GossipTasks:1] 2016-02-23 08:59:05,392 Gossiper.java (line 992) InetAddress /x.x.x.x is now DOWN INFO [GossipTasks:1] 2016-02-23 12:22:59,562 Gossiper.java (line 992) InetAddress /x.x.x.x is now DOWN 2016-02-23 18:01 GMT+01:00 daemeon reiydelle <daeme...@gmail.com>: > If you can, do a few (short, maybe 10m records, delete the default schema > between executions) run of Cassandra Stress test against your production > cluster (replication=3, force quorum to 3). Look for latency max in the 10s > of SECONDS. If your devops team is running a monitoring tool that looks at > the network, look for timeout/retries/errors/lost packets, etc. during the > run (worst case you need to do netstats runs against the relevant nic e.g. > every 10 seconds on the CassStress node, look for jumps in this count (if > monitoring is enabled, look at the monitor's results for ALL of your nodes. > At least one is having some issues. > > > *...* > > > > *Daemeon C.M. ReiydelleUSA (+1) 415.501.0198 > <%28%2B1%29%20415.501.0198>London (+44) (0) 20 8144 9872 > <%28%2B44%29%20%280%29%2020%208144%209872>* > > On Tue, Feb 23, 2016 at 8:43 AM, Jack Krupansky <jack.krupan...@gmail.com> > wrote: > >> The reality of modern distributed systems is that connectivity between >> nodes is never guaranteed and distributed software must be able to cope >> with occasional absence of connectivity. GC and network connectivity are >> the two issues that a lot of us are most familiar with. There may be others >> - but most technical problems on a node would be clearly logged on that >> node. If you see a lapse of connectivity no more than once or twice a day, >> consider yourselves lucky. >> >> Is it only one node at a time that goes down, and at widely dispersed >> times? >> >> How many nodes? >> >> -- Jack Krupansky >> >> On Tue, Feb 23, 2016 at 11:01 AM, Joel Samuelsson < >> samuelsson.j...@gmail.com> wrote: >> >>> Hi, >>> >>> Version is 2.0.17. >>> Yes, these are VMs in the cloud though I'm fairly certain they are on a >>> LAN rather than WAN. They are both in the same data centre physically. The >>> phi_convict_threshold is set to default. I'd rather find the root cause of >>> the problem than just hiding it by not convicting a node if it isn't >>> responding though. If pings are <2 ms without a single ping missed in >>> several days, I highly doubt that network is the reason for the downtime. >>> >>> Best regards, >>> Joel >>> >>> 2016-02-23 16:39 GMT+01:00 <sean_r_dur...@homedepot.com>: >>> >>>> You didn’t mention version, but I saw this kind of thing very often in >>>> the 1.1 line. Often this is connected to network flakiness. Are these VMs? >>>> In the cloud? Connected over a WAN? You mention that ping seems fine. Take >>>> a look at the phi_convict_threshold in c assandra.yaml. You may need to >>>> increase it to reduce the UP/DOWN flapping behavior. >>>> >>>> >>>> >>>> >>>> >>>> Sean Durity >>>> >>>> >>>> >>>> *From:* Joel Samuelsson [mailto:samuelsson.j...@gmail.com] >>>> *Sent:* Tuesday, February 23, 2016 9:41 AM >>>> *To:* user@cassandra.apache.org >>>> *Subject:* Re: Nodes go down periodically >>>> >>>> >>>> >>>> Hi, >>>> >>>> >>>> >>>> Thanks for your reply. >>>> >>>> >>>> >>>> I have debug logging on an
Re: Nodes go down periodically
Hi, Version is 2.0.17. Yes, these are VMs in the cloud though I'm fairly certain they are on a LAN rather than WAN. They are both in the same data centre physically. The phi_convict_threshold is set to default. I'd rather find the root cause of the problem than just hiding it by not convicting a node if it isn't responding though. If pings are <2 ms without a single ping missed in several days, I highly doubt that network is the reason for the downtime. Best regards, Joel 2016-02-23 16:39 GMT+01:00 <sean_r_dur...@homedepot.com>: > You didn’t mention version, but I saw this kind of thing very often in the > 1.1 line. Often this is connected to network flakiness. Are these VMs? In > the cloud? Connected over a WAN? You mention that ping seems fine. Take a > look at the phi_convict_threshold in c assandra.yaml. You may need to > increase it to reduce the UP/DOWN flapping behavior. > > > > > > Sean Durity > > > > *From:* Joel Samuelsson [mailto:samuelsson.j...@gmail.com] > *Sent:* Tuesday, February 23, 2016 9:41 AM > *To:* user@cassandra.apache.org > *Subject:* Re: Nodes go down periodically > > > > Hi, > > > > Thanks for your reply. > > > > I have debug logging on and see no GC pauses that are that long. GC pauses > are all well below 1s and 99 times out of 100 below 100ms. > > Do I need to enable GC log options to see the pauses? > > I see plenty of these lines: > DEBUG [ScheduledTasks:1] 2016-02-22 10:43:02,891 GCInspector.java (line > 118) GC for ParNew: 24 ms for 1 collections > > as well as a few CMS GC log lines. > > > > Best regards, > > Joel > > > > 2016-02-23 15:14 GMT+01:00 Hannu Kröger <hkro...@gmail.com>: > > Hi, > > > > Those are probably GC pauses. Memory tuning is probably needed. Check the > parameters that you already have customised if they make sense. > > > > http://blog.mikiobraun.de/2010/08/cassandra-gc-tuning.html > > > > Hannu > > > > > > On 23 Feb 2016, at 16:08, Joel Samuelsson <samuelsson.j...@gmail.com> > wrote: > > > > Our nodes go down periodically, around 1-2 times each day. Downtime is > from <1 second to 30 or so seconds. > > > > INFO [GossipTasks:1] 2016-02-22 10:05:14,896 Gossiper.java (line 992) > InetAddress /109.74.13.67 is now DOWN > > INFO [RequestResponseStage:8844] 2016-02-22 10:05:38,331 Gossiper.java > (line 978) InetAddress /109.74.13.67 is now UP > > > > I find nothing odd in the logs around the same time. I logged a ping with > timestamp and checked during the same time and saw nothing weird (ping is > less than 2ms at all times). > > > > Does anyone have any suggestions as to why this might happen? > > > > Best regards, > Joel > > > > > > -- > > The information in this Internet Email is confidential and may be legally > privileged. It is intended solely for the addressee. Access to this Email > by anyone else is unauthorized. If you are not the intended recipient, any > disclosure, copying, distribution or any action taken or omitted to be > taken in reliance on it, is prohibited and may be unlawful. When addressed > to our clients any opinions or advice contained in this Email are subject > to the terms and conditions expressed in any applicable governing The Home > Depot terms of business or client engagement letter. The Home Depot > disclaims all responsibility and liability for the accuracy and content of > this attachment and for any damages or losses arising from any > inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other > items of a destructive nature, which may be contained in this attachment > and shall not be liable for direct, indirect, consequential or special > damages in connection with this e-mail message or its attachment. >
Re: Nodes go down periodically
Hi, Thanks for your reply. I have debug logging on and see no GC pauses that are that long. GC pauses are all well below 1s and 99 times out of 100 below 100ms. Do I need to enable GC log options to see the pauses? I see plenty of these lines: DEBUG [ScheduledTasks:1] 2016-02-22 10:43:02,891 GCInspector.java (line 118) GC for ParNew: 24 ms for 1 collections as well as a few CMS GC log lines. Best regards, Joel 2016-02-23 15:14 GMT+01:00 Hannu Kröger <hkro...@gmail.com>: > Hi, > > Those are probably GC pauses. Memory tuning is probably needed. Check the > parameters that you already have customised if they make sense. > > http://blog.mikiobraun.de/2010/08/cassandra-gc-tuning.html > > Hannu > > > On 23 Feb 2016, at 16:08, Joel Samuelsson <samuelsson.j...@gmail.com> > wrote: > > Our nodes go down periodically, around 1-2 times each day. Downtime is > from <1 second to 30 or so seconds. > > INFO [GossipTasks:1] 2016-02-22 10:05:14,896 Gossiper.java (line 992) > InetAddress /109.74.13.67 is now DOWN > INFO [RequestResponseStage:8844] 2016-02-22 10:05:38,331 Gossiper.java > (line 978) InetAddress /109.74.13.67 is now UP > > I find nothing odd in the logs around the same time. I logged a ping with > timestamp and checked during the same time and saw nothing weird (ping is > less than 2ms at all times). > > Does anyone have any suggestions as to why this might happen? > > Best regards, > Joel > > >
Nodes go down periodically
Our nodes go down periodically, around 1-2 times each day. Downtime is from <1 second to 30 or so seconds. INFO [GossipTasks:1] 2016-02-22 10:05:14,896 Gossiper.java (line 992) InetAddress /109.74.13.67 is now DOWN INFO [RequestResponseStage:8844] 2016-02-22 10:05:38,331 Gossiper.java (line 978) InetAddress /109.74.13.67 is now UP I find nothing odd in the logs around the same time. I logged a ping with timestamp and checked during the same time and saw nothing weird (ping is less than 2ms at all times). Does anyone have any suggestions as to why this might happen? Best regards, Joel
Logging of triggers
I'm testing triggers as part of a project and would like to add some logging to it. I'm using the same log structure as in the trigger example InvertedIndex but can't seem to find any logs. Where would I find the logging? In the system logs or somewhere else? /Joel
Re: Logging of triggers
I found now that i logged with a too low log level set so it was filtered from the system log. Logging with a more critical log level made the log messages appear in the system log. /Joel 2014-06-03 16:30 GMT+02:00 Joel Samuelsson samuelsson.j...@gmail.com: I'm testing triggers as part of a project and would like to add some logging to it. I'm using the same log structure as in the trigger example InvertedIndex but can't seem to find any logs. Where would I find the logging? In the system logs or somewhere else? /Joel
Tombstones on secondary indexes
My system log is full of messages like this one: WARN [ReadStage:42] 2014-05-15 08:19:13,615 SliceQueryFilter.java (line 210) Read 0 live and 2829 tombstoned cells in TrafficServer.rawData.rawData_evaluated_idx (see tombstone_warn_threshold) I've run a major compaction but the tombstones are not removed. https://issues.apache.org/jira/browse/CASSANDRA-4314 seems to say that tombstones on secondary indexes are not removed by a compaction. Do I need to do it manually? Best regards, Joel Samuelsson
Re: Weird timeouts
I am on Cassandra 2.0.5. How can I use the trace functionality? I did not check for exceptions. I will rerun and check. Thanks for suggestions. /Joel 2014-03-07 17:54 GMT+01:00 Duncan Sands duncan.sa...@gmail.com: Hi Joel, On 07/03/14 15:22, Joel Samuelsson wrote: I try to fetch all the row keys from a column family (there should only be a couple of hundred in that CF) in several different ways but I get timeouts whichever way I try: did you check the node logs for exceptions? You can get this kind of thing if there is an assertion failure when reading a particular row due to corruption for example. Ciao, Duncan. Through the cassandra cli: Fetching 45 rows is fine: list cf limit 46 columns 0; . . . 45 Rows Returned. Elapsed time: 298 msec(s). Fetching 46 rows however gives me a timeout after a minute or so: list cf limit 46 columns 0; null TimedOutException()... Through pycassa: keys = cf.get_range(column_count = 1, buffer_size = 2) for key, val in keys: print key This prints some keys and then gets stuck at the same place each time and then timeouts. The columns (column names + value) in the rows should be less than 100 bytes each, though there may be a lot of them on a particular row. To me it seems like one of the rows take too long time to fetch but I don't know why since I am limitiing the number of columns to 0. Without seeing the row, I have a hard time knowing what could be wrong. Do you have any ideas?
Weird timeouts
I try to fetch all the row keys from a column family (there should only be a couple of hundred in that CF) in several different ways but I get timeouts whichever way I try: Through the cassandra cli: Fetching 45 rows is fine: list cf limit 46 columns 0; . . . 45 Rows Returned. Elapsed time: 298 msec(s). Fetching 46 rows however gives me a timeout after a minute or so: list cf limit 46 columns 0; null TimedOutException()... Through pycassa: keys = cf.get_range(column_count = 1, buffer_size = 2) for key, val in keys: print key This prints some keys and then gets stuck at the same place each time and then timeouts. The columns (column names + value) in the rows should be less than 100 bytes each, though there may be a lot of them on a particular row. To me it seems like one of the rows take too long time to fetch but I don't know why since I am limitiing the number of columns to 0. Without seeing the row, I have a hard time knowing what could be wrong. Do you have any ideas?
Re: Intermittent long application pauses on nodes
What happens if a ParNew is triggered while CMS is running? Will it wait for the CMS to finish? If so, that would be the eplanation of our long ParNew above. Regards, Joel 2014-02-20 16:29 GMT+01:00 Joel Samuelsson samuelsson.j...@gmail.com: Hi Frank, We got a (quite) long GC pause today on 2.0.5: INFO [ScheduledTasks:1] 2014-02-20 13:51:14,528 GCInspector.java (line 116) GC for ParNew: 1627 ms for 1 collections, 425562984 used; max is 4253024256 INFO [ScheduledTasks:1] 2014-02-20 13:51:14,542 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 3703 ms for 2 collections, 434394920 used; max is 4253024256 Unfortunately it's a production cluster so I have no additional GC-logging enabled. This may be an indication that upgrading is not the (complete) solution. Regards, Joel 2014-02-17 13:41 GMT+01:00 Benedict Elliott Smith belliottsm...@datastax.com: Hi Ondrej, It's possible you were hit by the problems in this thread before, but it looks potentially like you may have other issues. Of course it may be that on G1 you have one issue and CMS another, but 27s is extreme even for G1, so it seems unlikely. If you're hitting these pause times in CMS and you get some more output from the safepoint tracing, please do contribute as I would love to get to the bottom of that, however is it possible you're experiencing paging activity? Have you made certain the VM memory is locked (and preferably that paging is entirely disabled, as the bloom filters and other memory won't be locked, although that shouldn't cause pauses during GC) Note that mmapped file accesses and other native work shouldn't in anyway inhibit GC activity or other safepoint pause times, unless there's a bug in the VM. These threads will simply enter a safepoint as they return to the VM execution context, and are considered safe for the duration they are outside. On 17 February 2014 12:30, Ondřej Černoš cern...@gmail.com wrote: Hi, we tried to switch to G1 because we observed this behaviour on CMS too (27 seconds pause in G1 is quite an advise not to use it). Pauses with CMS were not easily traceable - JVM stopped even without stop-the-world pause scheduled (defragmentation, remarking). We thought the go-to-safepoint waiting time might have been involved (we saw waiting for safepoint resolution) - especially because access to mmpaped files is not preemptive, afaik, but it doesn't explain tens of seconds waiting times, even slow IO should read our sstables into memory in much less time. We switched to G1 out of desperation - and to try different code paths - not that we'd thought it was a great idea. So I think we were hit by the problem discussed in this thread, just the G1 report wasn't very clear, sorry. regards, ondrej On Mon, Feb 17, 2014 at 11:45 AM, Benedict Elliott Smith belliottsm...@datastax.com wrote: Ondrej, It seems like your issue is much less difficult to diagnose: your collection times are long. At least, the pause you printed the time for is all attributable to the G1 pause. Note that G1 has not generally performed well with Cassandra in our testing. There are a number of changes going in soon that may change that, but for the time being it is advisable to stick with CMS. With tuning you can no doubt bring your pauses down considerably. On 17 February 2014 10:17, Ondřej Černoš cern...@gmail.com wrote: Hi all, we are seeing the same kind of long pauses in Cassandra. We tried to switch CMS to G1 without positive result. The stress test is read heavy, 2 datacenters, 6 nodes, 400reqs/sec on one datacenter. We see spikes in latency on 99.99 percentil and higher, caused by threads being stopped in JVM. The GC in G1 looks like this: {Heap before GC invocations=4073 (full 1): garbage-first heap total 8388608K, used 3602914K [0x0005f5c0, 0x0007f5c0, 0x0007f5c0) region size 4096K, 142 young (581632K), 11 survivors (45056K) compacting perm gen total 28672K, used 27428K [0x0007f5c0, 0x0007f780, 0x0008) the space 28672K, 95% used [0x0007f5c0, 0x0007f76c9108, 0x0007f76c9200, 0x0007f780) No shared spaces configured. 2014-02-17T04:44:16.385+0100: 222346.218: [GC pause (G1 Evacuation Pause) (young) Desired survivor size 37748736 bytes, new threshold 15 (max 15) - age 1: 17213632 bytes, 17213632 total - age 2: 19391208 bytes, 36604840 total , 0.1664300 secs] [Parallel Time: 163.9 ms, GC Workers: 2] [GC Worker Start (ms): Min: 222346218.3, Avg: 222346218.3, Max: 222346218.3, Diff: 0.0] [Ext Root Scanning (ms): Min: 6.0, Avg: 6.9, Max: 7.7, Diff: 1.7, Sum: 13.7] [Update RS (ms): Min: 20.4, Avg: 21.3, Max: 22.1, Diff: 1.7, Sum: 42.6] [Processed Buffers: Min: 49, Avg: 60.0, Max: 71, Diff: 22, Sum: 120] [Scan RS (ms): Min: 23.2, Avg: 23.2, Max: 23.3, Diff: 0.1, Sum: 46.5] [Object Copy (ms): Min: 112.3
Re: Intermittent long application pauses on nodes
: 163.8, Diff: 0.0, Sum: 327.6] [GC Worker End (ms): Min: 222346382.1, Avg: 222346382.1, Max: 222346382.1, Diff: 0.0] [Code Root Fixup: 0.0 ms] [Clear CT: 0.4 ms] [Other: 2.1 ms] [Choose CSet: 0.0 ms] [Ref Proc: 1.1 ms] [Ref Enq: 0.0 ms] [Free CSet: 0.4 ms] [Eden: 524.0M(524.0M)-0.0B(476.0M) Survivors: 44.0M-68.0M Heap: 3518.5M(8192.0M)-3018.5M(8192.0M)] Heap after GC invocations=4074 (full 1): garbage-first heap total 8388608K, used 3090914K [0x0005f5c0, 0x0007f5c0, 0x0007f5c0) region size 4096K, 17 young (69632K), 17 survivors (69632K) compacting perm gen total 28672K, used 27428K [0x0007f5c0, 0x0007f780, 0x0008) the space 28672K, 95% used [0x0007f5c0, 0x0007f76c9108, 0x0007f76c9200, 0x0007f780) No shared spaces configured. } [Times: user=0.35 sys=0.00, real=27.58 secs] 222346.219: G1IncCollectionPause [ 111 0 0] [ 0 0 0 0 27586] 0 And the total thime for which application threads were stopped is 27.58 seconds. CMS behaves in a similar manner. We thought it would be GC, waiting for mmaped files being read from disk (the thread cannot reach safepoint during this operation), but it doesn't explain the huge time. We'll try jhiccup to see if it provides any additional information. The test was done on mixed aws/openstack environment, openjdk 1.7.0_45, cassandra 1.2.11. Upgrading to 2.0.x is no option for us. regards, ondrej cernos On Fri, Feb 14, 2014 at 8:53 PM, Frank Ng fnt...@gmail.com wrote: Sorry, I have not had a chance to file a JIRA ticket. We have not been able to resolve the issue. But since Joel mentioned that upgrading to Cassandra 2.0.X solved it for them, we may need to upgrade. We are currently on Java 1.7 and Cassandra 1.2.8 On Thu, Feb 13, 2014 at 12:40 PM, Keith Wright kwri...@nanigans.comwrote: You’re running 2.0.* in production? May I ask what C* version and OS? Any hardware details would be appreciated as well. Thx! From: Joel Samuelsson samuelsson.j...@gmail.com Reply-To: user@cassandra.apache.org user@cassandra.apache.org Date: Thursday, February 13, 2014 at 11:39 AM To: user@cassandra.apache.org user@cassandra.apache.org Subject: Re: Intermittent long application pauses on nodes We have had similar issues and upgrading C* to 2.0.x and Java to 1.7 seems to have helped our issues. 2014-02-13 Keith Wright kwri...@nanigans.com: Frank did you ever file a ticket for this issue or find the root cause? I believe we are seeing the same issues when attempting to bootstrap. Thanks From: Robert Coli rc...@eventbrite.com Reply-To: user@cassandra.apache.org user@cassandra.apache.org Date: Monday, February 3, 2014 at 6:10 PM To: user@cassandra.apache.org user@cassandra.apache.org Subject: Re: Intermittent long application pauses on nodes On Mon, Feb 3, 2014 at 8:52 AM, Benedict Elliott Smith belliottsm...@datastax.com wrote: It's possible that this is a JVM issue, but if so there may be some remedial action we can take anyway. There are some more flags we should add, but we can discuss that once you open a ticket. If you could include the strange JMX error as well, that might be helpful. It would be appreciated if you could inform this thread of the JIRA ticket number, for the benefit of the community and google searchers. :) =Rob
Re: Intermittent long application pauses on nodes
We have had similar issues and upgrading C* to 2.0.x and Java to 1.7 seems to have helped our issues. 2014-02-13 Keith Wright kwri...@nanigans.com: Frank did you ever file a ticket for this issue or find the root cause? I believe we are seeing the same issues when attempting to bootstrap. Thanks From: Robert Coli rc...@eventbrite.com Reply-To: user@cassandra.apache.org user@cassandra.apache.org Date: Monday, February 3, 2014 at 6:10 PM To: user@cassandra.apache.org user@cassandra.apache.org Subject: Re: Intermittent long application pauses on nodes On Mon, Feb 3, 2014 at 8:52 AM, Benedict Elliott Smith belliottsm...@datastax.com wrote: It's possible that this is a JVM issue, but if so there may be some remedial action we can take anyway. There are some more flags we should add, but we can discuss that once you open a ticket. If you could include the strange JMX error as well, that might be helpful. It would be appreciated if you could inform this thread of the JIRA ticket number, for the benefit of the community and google searchers. :) =Rob
Re: Weird GC
Thanks for your help. I've added those flags as well as some others I saw in another thread that redirects stdout to a file. What information is it that you need? 2014-01-29 Benedict Elliott Smith belliottsm...@datastax.com: It's possible the time attributed to GC is actually spent somewhere else; a multitude of tasks may occur during the same safepoint as a GC. We've seen some batch revoke of biased locks take a long time, for instance; *if* this is happening in your case, and we can track down which objects, I would consider it a bug and we may be able to fix it. -XX:+PrintSafepointStatistics -XX:PrintSafepointStatisticsCount=1 On 29 January 2014 16:23, Joel Samuelsson samuelsson.j...@gmail.comwrote: Hi, We've been trying to figure out why we have so long and frequent stop-the-world GC even though we have basically no load. Today we got a log of a weird GC that I wonder if you have any theories of why it might have happened. A plot of our heap at the time, paired with the GC time from the Cassandra log: http://imgur.com/vw5rOzj -The blue line is the ratio of Eden space used (i.e. 1.0 = full) -The red line is the ratio of Survivor0 space used -The green line is the ratio of Survivor1 space used -The teal line is the ratio of Old Gen space used -The pink line shows during which period of time a GC happened (from the Cassandra log) Eden space is filling up and being cleared as expected in the first and last hill but on the middle one, it takes two seconds to clear Eden (note that Eden has ratio 1 for 2 seconds). Neither the survivor spaces nor old generation increase significantly afterwards. Any ideas of why this might be happening? We have swap disabled, JNA enabled, no CPU spikes at the time, no disk I/O spikes at the time. What else could be causing this? /Joel Samuelsson
Weird GC
Hi, We've been trying to figure out why we have so long and frequent stop-the-world GC even though we have basically no load. Today we got a log of a weird GC that I wonder if you have any theories of why it might have happened. A plot of our heap at the time, paired with the GC time from the Cassandra log: http://imgur.com/vw5rOzj -The blue line is the ratio of Eden space used (i.e. 1.0 = full) -The red line is the ratio of Survivor0 space used -The green line is the ratio of Survivor1 space used -The teal line is the ratio of Old Gen space used -The pink line shows during which period of time a GC happened (from the Cassandra log) Eden space is filling up and being cleared as expected in the first and last hill but on the middle one, it takes two seconds to clear Eden (note that Eden has ratio 1 for 2 seconds). Neither the survivor spaces nor old generation increase significantly afterwards. Any ideas of why this might be happening? We have swap disabled, JNA enabled, no CPU spikes at the time, no disk I/O spikes at the time. What else could be causing this? /Joel Samuelsson
Re: Extremely long GC
Here is one example. 12GB data, no load besides OpsCenter and perhaps 1-2 requests per minute. INFO [ScheduledTasks:1] 2013-12-29 01:03:25,381 GCInspector.java (line 119) GC for ParNew: 426400 ms for 1 collections, 2253360864 used; max is 4114612224 2014/1/22 Yogi Nerella ynerella...@gmail.com Hi, Can you share the GC logs for the systems you are running problems into? Yogi On Wed, Jan 22, 2014 at 6:50 AM, Joel Samuelsson samuelsson.j...@gmail.com wrote: Hello, We've been having problems with long GC pauses and can't seem to get rid of them. Our latest test is on a clean machine with Ubuntu 12.04 LTS, Java 1.7.0_45 and JNA installed. It is a single node cluster with most settings being default, the only things changed are ip-addresses, cluster name and partitioner (to RandomPartitioner). We are running Cassandra 2.0.4. We are running on a virtual machine with Xen. We have 16GB of ram and default memory settings for C* (i.e. heap size of 4GB). CPU specified as 8 cores by our provider. Right now, we have no data on the machine and no requests to it at all. Still we get ParNew GCs like the following: INFO [ScheduledTasks:1] 2014-01-18 10:54:42,286 GCInspector.java (line 116) GC for ParNew: 464 ms for 1 collections, 102838776 used; max is 4106223616 While this may not be extremely long, on other machines with the same setup but some data (around 12GB) and around 10 read requests/s (i.e. basically no load) we have seen ParNew GC for 20 minutes or more. During this time, the machine goes down completely (I can't even ssh to it). The requests are mostly from OpsCenter and the rows requested are not extremely large (typically less than 1KB). We have tried a lot of different things to solve these issues since we've been having them for a long time including: - Upgrading Cassandra to new versions - Upgrading Java to new versions - Printing promotion failures in GC-log (no failures found!) - Different sizes of heap and heap space for different GC spaces (Eden etc.) - Different versions of Ubuntu - Running on Amazon EC2 instead of the provider we are using now (not with Datastax AMI) Something that may be a clue is that when running the DataStax Community AMI on Amazon we haven't seen the GC yet (it's been running for a week or so). Just to be clear, another test on Amazon EC2 mentioned above (without the Datastax AMI) shows the GC freezes. If any other information is needed, just let me know. Best regards, Joel Samuelsson
Recurring actions with 4 hour interval
Hello, We've been having a lot of problems with extremely long GC (and still do) which I've asked about several times on this list (I can find links to those discussions if anyone is interested). We noticed a pattern that the GC pauses may be related to something happening every 4 hours. Is there anything specific happening within Cassandra with a 4 hour interval? Any help is much appreciated, Joel Samuelsson
Re: Reduce Cassandra GC
12.3 GB data per node (only one ńode). 16GB RAM. In virtual environment with the CPU specified as 8 cores, average CPU use is close to 0% (basically no load, around 12 requests / sec, mostly from OpsCenter). Average memory use is 4GB. Around 1GB heap used by Cassandra (out of 4GB). 2013/6/19 Mohit Anchlia mohitanch...@gmail.com How much data do you have per node? How much RAM per node? How much CPU per node? What is the avg CPU and memory usage? On Wed, Jun 19, 2013 at 12:16 AM, Joel Samuelsson samuelsson.j...@gmail.com wrote: My Cassandra ps info: root 26791 1 0 07:14 ?00:00:00 /usr/bin/jsvc -user cassandra -home /opt/java/64/jre1.6.0_32/bin/../ -pidfile /var/run/cassandra.pid -errfile 1 -outfile /var/log/cassandra/output.log -cp /usr/share/cassandra/lib/antlr-3.2.jar:/usr/share/cassandra/lib/avro-1.4.0-fixes.jar:/usr/share/cassandra/lib/avro-1.4.0-sources-fixes.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang-2.6.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.3.jar:/usr/share/cassandra/lib/guava-13.0.1.jar:/usr/share/cassandra/lib/high-scale-lib-1.1.2.jar:/usr/share/cassandra/lib/jackson-core-asl-1.9.2.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.9.2.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar:/usr/share/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/cassandra/lib/jline-1.0.jar:/usr/share/cassandra/lib/jna.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.7.0.jar:/usr/share/cassandra/lib/log4j-1.2.16.jar:/usr/share/cassandra/lib/lz4-1.1.0.jar:/usr/share/cassandra/lib/metrics-core-2.0.3.jar:/usr/share/cassandra/lib/netty-3.5.9.Final.jar:/usr/share/cassandra/lib/servlet-api-2.5-20081211.jar:/usr/share/cassandra/lib/slf4j-api-1.7.2.jar:/usr/share/cassandra/lib/slf4j-log4j12-1.7.2.jar:/usr/share/cassandra/lib/snakeyaml-1.6.jar:/usr/share/cassandra/lib/snappy-java-1.0.4.1.jar:/usr/share/cassandra/lib/snaptree-0.1.jar:/usr/share/cassandra/apache-cassandra-1.2.5.jar:/usr/share/cassandra/apache-cassandra-thrift-1.2.5.jar:/usr/share/cassandra/apache-cassandra.jar:/usr/share/cassandra/stress.jar:/usr/share/java/jna.jar:/etc/cassandra:/usr/share/java/commons-daemon.jar -Dlog4j.configuration=log4j-server.properties -Dlog4j.defaultInitOverride=true -XX:HeapDumpPath=/var/lib/cassandra/java_1371626058.hprof -XX:ErrorFile=/var/lib/cassandra/hs_err_1371626058.log -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms4004M -Xmx4004M -Xmn800M -XX:+HeapDumpOnOutOfMemoryError -Xss180k -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseTLAB -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false org.apache.cassandra.service.CassandraDaemon 103 26792 26791 99 07:14 ?854015-22:02:22 /usr/bin/jsvc -user cassandra -home /opt/java/64/jre1.6.0_32/bin/../ -pidfile /var/run/cassandra.pid -errfile 1 -outfile /var/log/cassandra/output.log -cp /usr/share/cassandra/lib/antlr-3.2.jar:/usr/share/cassandra/lib/avro-1.4.0-fixes.jar:/usr/share/cassandra/lib/avro-1.4.0-sources-fixes.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang-2.6.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.3.jar:/usr/share/cassandra/lib/guava-13.0.1.jar:/usr/share/cassandra/lib/high-scale-lib-1.1.2.jar:/usr/share/cassandra/lib/jackson-core-asl-1.9.2.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.9.2.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar:/usr/share/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/cassandra/lib/jline-1.0.jar:/usr/share/cassandra/lib/jna.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.7.0.jar:/usr/share/cassandra/lib/log4j-1.2.16.jar:/usr/share/cassandra/lib/lz4-1.1.0.jar:/usr/share/cassandra/lib/metrics-core-2.0.3.jar:/usr/share/cassandra/lib/netty-3.5.9.Final.jar:/usr/share/cassandra/lib/servlet-api-2.5-20081211.jar:/usr/share/cassandra/lib/slf4j-api-1.7.2.jar:/usr/share/cassandra/lib/slf4j-log4j12-1.7.2.jar:/usr/share/cassandra/lib/snakeyaml-1.6.jar:/usr/share/cassandra/lib/snappy-java-1.0.4.1.jar:/usr/share/cassandra/lib/snaptree-0.1.jar:/usr/share/cassandra/apache-cassandra-1.2.5.jar:/usr/share/cassandra/apache-cassandra-thrift-1.2.5.jar:/usr/share/cassandra/apache-cassandra.jar:/usr/share/cassandra/stress.jar:/usr/share/java/jna.jar:/etc/cassandra:/usr/share/java/commons-daemon.jar -Dlog4j.configuration=log4j-server.properties -Dlog4j.defaultInitOverride=true -XX:HeapDumpPath=/var/lib/cassandra/java_1371626058
Re: Reduce Cassandra GC
My Cassandra ps info: root 26791 1 0 07:14 ?00:00:00 /usr/bin/jsvc -user cassandra -home /opt/java/64/jre1.6.0_32/bin/../ -pidfile /var/run/cassandra.pid -errfile 1 -outfile /var/log/cassandra/output.log -cp /usr/share/cassandra/lib/antlr-3.2.jar:/usr/share/cassandra/lib/avro-1.4.0-fixes.jar:/usr/share/cassandra/lib/avro-1.4.0-sources-fixes.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang-2.6.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.3.jar:/usr/share/cassandra/lib/guava-13.0.1.jar:/usr/share/cassandra/lib/high-scale-lib-1.1.2.jar:/usr/share/cassandra/lib/jackson-core-asl-1.9.2.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.9.2.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar:/usr/share/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/cassandra/lib/jline-1.0.jar:/usr/share/cassandra/lib/jna.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.7.0.jar:/usr/share/cassandra/lib/log4j-1.2.16.jar:/usr/share/cassandra/lib/lz4-1.1.0.jar:/usr/share/cassandra/lib/metrics-core-2.0.3.jar:/usr/share/cassandra/lib/netty-3.5.9.Final.jar:/usr/share/cassandra/lib/servlet-api-2.5-20081211.jar:/usr/share/cassandra/lib/slf4j-api-1.7.2.jar:/usr/share/cassandra/lib/slf4j-log4j12-1.7.2.jar:/usr/share/cassandra/lib/snakeyaml-1.6.jar:/usr/share/cassandra/lib/snappy-java-1.0.4.1.jar:/usr/share/cassandra/lib/snaptree-0.1.jar:/usr/share/cassandra/apache-cassandra-1.2.5.jar:/usr/share/cassandra/apache-cassandra-thrift-1.2.5.jar:/usr/share/cassandra/apache-cassandra.jar:/usr/share/cassandra/stress.jar:/usr/share/java/jna.jar:/etc/cassandra:/usr/share/java/commons-daemon.jar -Dlog4j.configuration=log4j-server.properties -Dlog4j.defaultInitOverride=true -XX:HeapDumpPath=/var/lib/cassandra/java_1371626058.hprof -XX:ErrorFile=/var/lib/cassandra/hs_err_1371626058.log -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms4004M -Xmx4004M -Xmn800M -XX:+HeapDumpOnOutOfMemoryError -Xss180k -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseTLAB -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false org.apache.cassandra.service.CassandraDaemon 103 26792 26791 99 07:14 ?854015-22:02:22 /usr/bin/jsvc -user cassandra -home /opt/java/64/jre1.6.0_32/bin/../ -pidfile /var/run/cassandra.pid -errfile 1 -outfile /var/log/cassandra/output.log -cp /usr/share/cassandra/lib/antlr-3.2.jar:/usr/share/cassandra/lib/avro-1.4.0-fixes.jar:/usr/share/cassandra/lib/avro-1.4.0-sources-fixes.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang-2.6.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.3.jar:/usr/share/cassandra/lib/guava-13.0.1.jar:/usr/share/cassandra/lib/high-scale-lib-1.1.2.jar:/usr/share/cassandra/lib/jackson-core-asl-1.9.2.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.9.2.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar:/usr/share/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/cassandra/lib/jline-1.0.jar:/usr/share/cassandra/lib/jna.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.7.0.jar:/usr/share/cassandra/lib/log4j-1.2.16.jar:/usr/share/cassandra/lib/lz4-1.1.0.jar:/usr/share/cassandra/lib/metrics-core-2.0.3.jar:/usr/share/cassandra/lib/netty-3.5.9.Final.jar:/usr/share/cassandra/lib/servlet-api-2.5-20081211.jar:/usr/share/cassandra/lib/slf4j-api-1.7.2.jar:/usr/share/cassandra/lib/slf4j-log4j12-1.7.2.jar:/usr/share/cassandra/lib/snakeyaml-1.6.jar:/usr/share/cassandra/lib/snappy-java-1.0.4.1.jar:/usr/share/cassandra/lib/snaptree-0.1.jar:/usr/share/cassandra/apache-cassandra-1.2.5.jar:/usr/share/cassandra/apache-cassandra-thrift-1.2.5.jar:/usr/share/cassandra/apache-cassandra.jar:/usr/share/cassandra/stress.jar:/usr/share/java/jna.jar:/etc/cassandra:/usr/share/java/commons-daemon.jar -Dlog4j.configuration=log4j-server.properties -Dlog4j.defaultInitOverride=true -XX:HeapDumpPath=/var/lib/cassandra/java_1371626058.hprof -XX:ErrorFile=/var/lib/cassandra/hs_err_1371626058.log -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms4004M -Xmx4004M -Xmn800M -XX:+HeapDumpOnOutOfMemoryError -Xss180k -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseTLAB -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199 -Dcom.sun.management.jmxremote.ssl=false
Re: Reduce Cassandra GC
/cassandra/apache-cassandra-thrift-1.2.5.jar:/usr/share/cassandra/apache-cassandra.jar:/usr/share/cassandra/stress.jar:/usr/share/java/jna.jar:/etc/cassandra:/usr/share/java/commons-daemon.jar -Dlog4j.configuration=log4j-server.properties -Dlog4j.defaultInitOverride=true -XX:HeapDumpPath=/var/lib/cassandra/java_1371632342.hprof -XX:ErrorFile=/var/lib/cassandra/hs_err_1371632342.log -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms4004M -Xmx4004M -Xmn800M -XX:+HeapDumpOnOutOfMemoryError -Xss180k -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseTLAB -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false org.apache.cassandra.service.CassandraDaemon joel 5502 9763 0 08:59 pts/200:00:00 grep --color=auto cassandra Can the two processes have anything to do with my issues? 2013/6/19 Takenori Sato ts...@cloudian.com GC options are not set. You should see the followings. -XX:+PrintGCDateStamps -XX:+PrintPromotionFailure -Xloggc:/var/log/cassandra/gc-1371603607.log Is it normal to have two processes like this? No. You are running two processes. On Wed, Jun 19, 2013 at 4:16 PM, Joel Samuelsson samuelsson.j...@gmail.com wrote: My Cassandra ps info: root 26791 1 0 07:14 ?00:00:00 /usr/bin/jsvc -user cassandra -home /opt/java/64/jre1.6.0_32/bin/../ -pidfile /var/run/cassandra.pid -errfile 1 -outfile /var/log/cassandra/output.log -cp /usr/share/cassandra/lib/antlr-3.2.jar:/usr/share/cassandra/lib/avro-1.4.0-fixes.jar:/usr/share/cassandra/lib/avro-1.4.0-sources-fixes.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang-2.6.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.3.jar:/usr/share/cassandra/lib/guava-13.0.1.jar:/usr/share/cassandra/lib/high-scale-lib-1.1.2.jar:/usr/share/cassandra/lib/jackson-core-asl-1.9.2.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.9.2.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar:/usr/share/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/cassandra/lib/jline-1.0.jar:/usr/share/cassandra/lib/jna.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.7.0.jar:/usr/share/cassandra/lib/log4j-1.2.16.jar:/usr/share/cassandra/lib/lz4-1.1.0.jar:/usr/share/cassandra/lib/metrics-core-2.0.3.jar:/usr/share/cassandra/lib/netty-3.5.9.Final.jar:/usr/share/cassandra/lib/servlet-api-2.5-20081211.jar:/usr/share/cassandra/lib/slf4j-api-1.7.2.jar:/usr/share/cassandra/lib/slf4j-log4j12-1.7.2.jar:/usr/share/cassandra/lib/snakeyaml-1.6.jar:/usr/share/cassandra/lib/snappy-java-1.0.4.1.jar:/usr/share/cassandra/lib/snaptree-0.1.jar:/usr/share/cassandra/apache-cassandra-1.2.5.jar:/usr/share/cassandra/apache-cassandra-thrift-1.2.5.jar:/usr/share/cassandra/apache-cassandra.jar:/usr/share/cassandra/stress.jar:/usr/share/java/jna.jar:/etc/cassandra:/usr/share/java/commons-daemon.jar -Dlog4j.configuration=log4j-server.properties -Dlog4j.defaultInitOverride=true -XX:HeapDumpPath=/var/lib/cassandra/java_1371626058.hprof -XX:ErrorFile=/var/lib/cassandra/hs_err_1371626058.log -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms4004M -Xmx4004M -Xmn800M -XX:+HeapDumpOnOutOfMemoryError -Xss180k -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseTLAB -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false org.apache.cassandra.service.CassandraDaemon 103 26792 26791 99 07:14 ?854015-22:02:22 /usr/bin/jsvc -user cassandra -home /opt/java/64/jre1.6.0_32/bin/../ -pidfile /var/run/cassandra.pid -errfile 1 -outfile /var/log/cassandra/output.log -cp /usr/share/cassandra/lib/antlr-3.2.jar:/usr/share/cassandra/lib/avro-1.4.0-fixes.jar:/usr/share/cassandra/lib/avro-1.4.0-sources-fixes.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang-2.6.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.3.jar:/usr/share/cassandra/lib/guava-13.0.1.jar:/usr/share/cassandra/lib/high-scale-lib-1.1.2.jar:/usr/share/cassandra/lib/jackson-core-asl-1.9.2.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.9.2.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar:/usr/share/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/cassandra/lib/jline-1.0.jar:/usr/share/cassandra/lib/jna.jar
Re: Reduce Cassandra GC
StatusLogger.java (line 116) testing_Keyspace.cf22 0,0 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,517 StatusLogger.java (line 116) OpsCenter.rollups7200 0,0 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,517 StatusLogger.java (line 116) OpsCenter.rollups864000,0 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,517 StatusLogger.java (line 116) OpsCenter.rollups60 13745,3109686 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,517 StatusLogger.java (line 116) OpsCenter.events 18,826 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,518 StatusLogger.java (line 116) OpsCenter.rollups300 2516,570931 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,519 StatusLogger.java (line 116) OpsCenter.pdps9072,160850 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,519 StatusLogger.java (line 116) OpsCenter.events_timeline3,86 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,520 StatusLogger.java (line 116) OpsCenter.settings0,0 And from gc-1371454124.log I get: 2013-06-17T08:11:22.300+: 2551.288: [GC 870971K-216494K(4018176K), 145.1887460 secs] 2013/6/18 Takenori Sato ts...@cloudian.com Find promotion failure. Bingo if it happened at the time. Otherwise, post the relevant portion of the log here. Someone may find a hint. On Mon, Jun 17, 2013 at 5:51 PM, Joel Samuelsson samuelsson.j...@gmail.com wrote: Just got a very long GC again. What am I to look for in the logging I just enabled? 2013/6/17 Joel Samuelsson samuelsson.j...@gmail.com If you are talking about 1.2.x then I also have memory problems on the idle cluster: java memory constantly slow grows up to limit, then spend long time for GC. I never seen such behaviour for 1.0.x and 1.1.x, where on idle cluster java memory stay on the same value. No I am running Cassandra 1.1.8. Can you paste you gc config? I believe the relevant configs are these: # GC tuning options JVM_OPTS=$JVM_OPTS -XX:+UseParNewGC JVM_OPTS=$JVM_OPTS -XX:+UseConcMarkSweepGC JVM_OPTS=$JVM_OPTS -XX:+CMSParallelRemarkEnabled JVM_OPTS=$JVM_OPTS -XX:SurvivorRatio=8 JVM_OPTS=$JVM_OPTS -XX:MaxTenuringThreshold=1 JVM_OPTS=$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75 JVM_OPTS=$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly I haven't changed anything in the environment config up until now. Also can you take a heap dump at 2 diff points so that we can compare it? I can't access the machine at all during the stop-the-world freezes. Was that what you wanted me to try? Uncomment the followings in cassandra-env.sh. Done. Will post results as soon as I get a new stop-the-world gc. If you are unable to find a JIRA, file one Unless this turns out to be a problem on my end, I will.
Re: Reduce Cassandra GC
Yes, like I said, the only relevant output from that file was: 2013-06-17T08:11:22.300+: 2551.288: [GC 870971K-216494K(4018176K), 145.1887460 secs] 2013/6/18 Takenori Sato ts...@cloudian.com GC logging is not in system.log. But in the following file. JVM_OPTS=$JVM_OPTS -Xloggc:/var/log/cassandra/gc-`date +%s`.log At least, no GC logs are shown in your post. On Tue, Jun 18, 2013 at 5:05 PM, Joel Samuelsson samuelsson.j...@gmail.com wrote: Can't find any promotion failure. In system.log this is what I get: INFO [ScheduledTasks:1] 2013-06-17 08:13:47,490 GCInspector.java (line 122) GC for ParNew: 145189 ms for 1 collections, 225905072 used; max is 4114612224 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,490 StatusLogger.java (line 57) Pool NameActive Pending Blocked INFO [ScheduledTasks:1] 2013-06-17 08:13:47,491 StatusLogger.java (line 72) ReadStage 0 0 0 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,492 StatusLogger.java (line 72) RequestResponseStage 0 0 0 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,492 StatusLogger.java (line 72) ReadRepairStage 0 0 0 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,492 StatusLogger.java (line 72) MutationStage 0 0 0 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,493 StatusLogger.java (line 72) ReplicateOnWriteStage 0 0 0 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,493 StatusLogger.java (line 72) GossipStage 0 0 0 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,493 StatusLogger.java (line 72) AntiEntropyStage 0 0 0 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,494 StatusLogger.java (line 72) MigrationStage0 0 0 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,494 StatusLogger.java (line 72) StreamStage 0 0 0 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,494 StatusLogger.java (line 72) MemtablePostFlusher 0 0 0 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,495 StatusLogger.java (line 72) FlushWriter 0 0 0 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,495 StatusLogger.java (line 72) MiscStage 0 0 0 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,499 StatusLogger.java (line 72) commitlog_archiver0 0 0 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,499 StatusLogger.java (line 72) InternalResponseStage 0 0 0 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,499 StatusLogger.java (line 72) HintedHandoff 0 0 0 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,500 StatusLogger.java (line 77) CompactionManager 0 0 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,500 StatusLogger.java (line 89) MessagingServicen/a 0,0 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,500 StatusLogger.java (line 99) Cache Type SizeCapacityKeysToSave Provider INFO [ScheduledTasks:1] 2013-06-17 08:13:47,504 StatusLogger.java (line 100) KeyCache 12129 2184533 all INFO [ScheduledTasks:1] 2013-06-17 08:13:47,505 StatusLogger.java (line 106) RowCache 0 0 all org.apache.cassandra.cache.SerializingCacheProvider INFO [ScheduledTasks:1] 2013-06-17 08:13:47,505 StatusLogger.java (line 113) ColumnFamilyMemtable ops,data INFO [ScheduledTasks:1] 2013-06-17 08:13:47,505 StatusLogger.java (line 116) system.NodeIdInfo 0,0 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,505 StatusLogger.java (line 116) system.IndexInfo 0,0 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,505 StatusLogger.java (line 116) system.LocationInfo 0,0 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,506 StatusLogger.java (line 116) system.Versions 3,103 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,506 StatusLogger.java (line 116) system.schema_keyspacees 0,0 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,506 StatusLogger.java (line 116) system.Migrations 0,0 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,506 StatusLogger.java (line 116) system.schema_columnfamilies 0,0 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,506 StatusLogger.java (line 116) system.schema_columns 0,0 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,507 StatusLogger.java (line 116) system.HintsColumnFamily 0,0 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,507 StatusLogger.java (line 116) system.Schema 0,0 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,507 StatusLogger.java (line 116
Re: Reduce Cassandra GC
If you are talking about 1.2.x then I also have memory problems on the idle cluster: java memory constantly slow grows up to limit, then spend long time for GC. I never seen such behaviour for 1.0.x and 1.1.x, where on idle cluster java memory stay on the same value. No I am running Cassandra 1.1.8. Can you paste you gc config? I believe the relevant configs are these: # GC tuning options JVM_OPTS=$JVM_OPTS -XX:+UseParNewGC JVM_OPTS=$JVM_OPTS -XX:+UseConcMarkSweepGC JVM_OPTS=$JVM_OPTS -XX:+CMSParallelRemarkEnabled JVM_OPTS=$JVM_OPTS -XX:SurvivorRatio=8 JVM_OPTS=$JVM_OPTS -XX:MaxTenuringThreshold=1 JVM_OPTS=$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75 JVM_OPTS=$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly I haven't changed anything in the environment config up until now. Also can you take a heap dump at 2 diff points so that we can compare it? I can't access the machine at all during the stop-the-world freezes. Was that what you wanted me to try? Uncomment the followings in cassandra-env.sh. Done. Will post results as soon as I get a new stop-the-world gc. If you are unable to find a JIRA, file one Unless this turns out to be a problem on my end, I will.
Re: Reduce Cassandra GC
Just got a very long GC again. What am I to look for in the logging I just enabled? 2013/6/17 Joel Samuelsson samuelsson.j...@gmail.com If you are talking about 1.2.x then I also have memory problems on the idle cluster: java memory constantly slow grows up to limit, then spend long time for GC. I never seen such behaviour for 1.0.x and 1.1.x, where on idle cluster java memory stay on the same value. No I am running Cassandra 1.1.8. Can you paste you gc config? I believe the relevant configs are these: # GC tuning options JVM_OPTS=$JVM_OPTS -XX:+UseParNewGC JVM_OPTS=$JVM_OPTS -XX:+UseConcMarkSweepGC JVM_OPTS=$JVM_OPTS -XX:+CMSParallelRemarkEnabled JVM_OPTS=$JVM_OPTS -XX:SurvivorRatio=8 JVM_OPTS=$JVM_OPTS -XX:MaxTenuringThreshold=1 JVM_OPTS=$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75 JVM_OPTS=$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly I haven't changed anything in the environment config up until now. Also can you take a heap dump at 2 diff points so that we can compare it? I can't access the machine at all during the stop-the-world freezes. Was that what you wanted me to try? Uncomment the followings in cassandra-env.sh. Done. Will post results as soon as I get a new stop-the-world gc. If you are unable to find a JIRA, file one Unless this turns out to be a problem on my end, I will.
Re: Reduce Cassandra GC
I keep having issues with GC. Besides the cluster mentioned above, we also have a single node development cluster having the same issues. This node has 12.33 GB data, a couple of million skinny rows and basically no load. It has default memory settings but keep getting very long stop-the-world GC pauses: INFO [ScheduledTasks:1] 2013-06-07 10:37:02,537 GCInspector.java (line 122) GC for ParNew: 99342 ms for 1 collections, 1400754488 used; max is 4114612224 To try to rule out amount of memory, I set it to 16GB (we're on a virtual environment), with 4GB of it for Cassandra heap but that didn't help either, the incredibly long GC pauses keep coming. So I think something else is causing these issues, unless everyone is having really long GC pauses (which I doubt). I came across this thread: http://www.mail-archive.com/user@cassandra.apache.org/msg24042.html suggesting # date -s “`date`” might help my issues. It didn't however (unless I am supposed to replace that second date with the actual date?). Has anyone had similar issues? 2013/4/17 aaron morton aa...@thelastpickle.com INFO [ScheduledTasks:1] 2013-04-15 14:00:02,749 GCInspector.java (line 122) GC for ParNew: 338798 ms for 1 collections, 592212416 used; max is 1046937600 This does not say that the heap is full. ParNew is GC activity for the new heap, which is typically a smaller part of the overall heap. It sounds like you are running with defaults for the memory config, which is generally a good idea. But 4GB total memory for a node is on the small size. Try some changes, edit the cassandra-env.sh file and change MAX_HEAP_SIZE=2G HEAP_NEWSIZE=400M You may also want to try: MAX_HEAP_SIZE=2G HEAP_NEWSIZE=800M JVM_OPTS=$JVM_OPTS -XX:SurvivorRatio=4 JVM_OPTS=$JVM_OPTS -XX:MaxTenuringThreshold=2 The size of the new heap generally depends on the number of cores available, see the commends in the -env file. An older discussion about memory use, not that in 1.2 the bloom filters (and compression data) are off heap now. http://www.mail-archive.com/user@cassandra.apache.org/msg25762.html Hope that helps. - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 17/04/2013, at 11:06 PM, Joel Samuelsson samuelsson.j...@gmail.com wrote: You're right, it's probably hard. I should have provided more data. I'm running Ubuntu 10.04 LTS with JNA installed. I believe this line in the log indicates that JNA is working, please correct me if I'm wrong: CLibrary.java (line 111) JNA mlockall successful Total amount of RAM is 4GB. My description of data size was very bad. Sorry about that. Data set size is 12.3 GB per node, compressed. Heap size is 998.44MB according to nodetool info. Key cache is 49MB bytes according to nodetool info. Row cache size is 0 bytes acoording to nodetool info. Max new heap is 205MB kbytes according to Memory Pool Par Eden Space max in jconsole. Memtable is left at default which should give it 333MB according to documentation (uncertain where I can verify this). Our production cluster seems similar to your dev cluster so possibly increasing the heap to 2GB might help our issues. I am still interested in getting rough estimates of how much heap will be needed as data grows. Other than empirical studies how would I go about getting such estimates? 2013/4/16 Viktor Jevdokimov viktor.jevdoki...@adform.com How one could provide any help without any knowledge about your cluster, node and environment settings? 40GB was calculated from 2 nodes with RF=2 (each has 100% data range), 2.4-2.5M rows * 6 cols * 3kB as a minimum without compression and any overhead (sstable, bloom filters and indexes). With ParNew GC time such as yours even if it is a swapping issue I could say only that heap size is too small. Check Heap, New Heap sizes, memtable and cache sizes. Are you on Linux? Is JNA installed and used? What is total amount of RAM? Just for a DEV environment we use 3 virtual machines with 4GB RAM and use 2GB heap without any GC issue with amount of data from 0 to 16GB compressed on each node. Memtable space sized to 100MB, New Heap 400MB. Best regards / Pagarbiai Viktor Jevdokimov Senior Developer Email: viktor.jevdoki...@adform.com Phone: +370 5 212 3063, Fax +370 5 261 0453 J. Jasinskio 16C, LT-01112 Vilnius, Lithuania Follow us on Twitter: @adforminsider Take a ride with Adform's Rich Media Suite signature-logo29.png signature-best-employer-logo4823.png Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received
Re: Reduce Cassandra GC
You're right, it's probably hard. I should have provided more data. I'm running Ubuntu 10.04 LTS with JNA installed. I believe this line in the log indicates that JNA is working, please correct me if I'm wrong: CLibrary.java (line 111) JNA mlockall successful Total amount of RAM is 4GB. My description of data size was very bad. Sorry about that. Data set size is 12.3 GB per node, compressed. Heap size is 998.44MB according to nodetool info. Key cache is 49MB bytes according to nodetool info. Row cache size is 0 bytes acoording to nodetool info. Max new heap is 205MB kbytes according to Memory Pool Par Eden Space max in jconsole. Memtable is left at default which should give it 333MB according to documentation (uncertain where I can verify this). Our production cluster seems similar to your dev cluster so possibly increasing the heap to 2GB might help our issues. I am still interested in getting rough estimates of how much heap will be needed as data grows. Other than empirical studies how would I go about getting such estimates? 2013/4/16 Viktor Jevdokimov viktor.jevdoki...@adform.com How one could provide any help without any knowledge about your cluster, node and environment settings? ** ** 40GB was calculated from 2 nodes with RF=2 (each has 100% data range), 2.4-2.5M rows * 6 cols * 3kB as a minimum without compression and any overhead (sstable, bloom filters and indexes). ** ** With ParNew GC time such as yours even if it is a swapping issue I could say only that heap size is too small. ** ** Check Heap, New Heap sizes, memtable and cache sizes. Are you on Linux? Is JNA installed and used? What is total amount of RAM? ** ** Just for a DEV environment we use 3 virtual machines with 4GB RAM and use 2GB heap without any GC issue with amount of data from 0 to 16GB compressed on each node. Memtable space sized to 100MB, New Heap 400MB. ** ** Best regards / Pagarbiai *Viktor Jevdokimov* Senior Developer Email: viktor.jevdoki...@adform.com Phone: +370 5 212 3063, Fax +370 5 261 0453 J. Jasinskio 16C, LT-01112 Vilnius, Lithuania Follow us on Twitter: @adforminsider http://twitter.com/#!/adforminsider Take a ride with Adform's Rich Media Suitehttp://vimeo.com/adform/richmedia [image: Adform News] http://www.adform.com [image: Adform awarded the Best Employer 2012] http://www.adform.com/site/blog/adform/adform-takes-top-spot-in-best-employer-survey/ Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies. *From:* Joel Samuelsson [mailto:samuelsson.j...@gmail.com] *Sent:* Tuesday, April 16, 2013 12:52 *To:* user@cassandra.apache.org *Subject:* Re: Reduce Cassandra GC ** ** How do you calculate the heap / data size ratio? Is this a linear ratio?** ** ** ** Each node has slightly more than 12 GB right now though. ** ** 2013/4/16 Viktor Jevdokimov viktor.jevdoki...@adform.com For a 40GB of data 1GB of heap is too low. Best regards / Pagarbiai *Viktor Jevdokimov* Senior Developer ** ** Email: viktor.jevdoki...@adform.com Phone: +370 5 212 3063, Fax +370 5 261 0453 J. Jasinskio 16C, LT-01112 Vilnius, Lithuania Follow us on Twitter: @adforminsider http://twitter.com/#!/adforminsider Take a ride with Adform's Rich Media Suitehttp://vimeo.com/adform/richmedia [image: Adform News] http://www.adform.com [image: Adform awarded the Best Employer 2012]http://www.adform.com/site/blog/adform/adform-takes-top-spot-in-best-employer-survey/ Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies. ** ** *From:* Joel Samuelsson [mailto:samuelsson.j...@gmail.com] *Sent:* Tuesday, April 16, 2013 10:47 *To:* user@cassandra.apache.org *Subject:* Reduce Cassandra GC Hi, We have a small production cluster with two nodes. The load on the nodes is very small, around 20 reads / sec and about the same for writes. There are around 2.5 million keys in the cluster and a RF of 2. About 2.4 million of the rows are skinny (6 columns) and around 3kb
Re: Reduce Cassandra GC
How do you calculate the heap / data size ratio? Is this a linear ratio? Each node has slightly more than 12 GB right now though. 2013/4/16 Viktor Jevdokimov viktor.jevdoki...@adform.com For a 40GB of data 1GB of heap is too low. ** ** Best regards / Pagarbiai *Viktor Jevdokimov* Senior Developer Email: viktor.jevdoki...@adform.com Phone: +370 5 212 3063, Fax +370 5 261 0453 J. Jasinskio 16C, LT-01112 Vilnius, Lithuania Follow us on Twitter: @adforminsider http://twitter.com/#!/adforminsider Take a ride with Adform's Rich Media Suitehttp://vimeo.com/adform/richmedia [image: Adform News] http://www.adform.com [image: Adform awarded the Best Employer 2012] http://www.adform.com/site/blog/adform/adform-takes-top-spot-in-best-employer-survey/ Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies. *From:* Joel Samuelsson [mailto:samuelsson.j...@gmail.com] *Sent:* Tuesday, April 16, 2013 10:47 *To:* user@cassandra.apache.org *Subject:* Reduce Cassandra GC ** ** Hi, ** ** We have a small production cluster with two nodes. The load on the nodes is very small, around 20 reads / sec and about the same for writes. There are around 2.5 million keys in the cluster and a RF of 2. ** ** About 2.4 million of the rows are skinny (6 columns) and around 3kb in size (each). Currently, scripts are running, accessing all of the keys in timeorder to do some calculations. ** ** While running the scripts, the nodes go down and then come back up 6-7 minutes later. This seems to be due to GC. I get lines like this in the log: INFO [ScheduledTasks:1] 2013-04-15 14:00:02,749 GCInspector.java (line 122) GC for ParNew: 338798 ms for 1 collections, 592212416 used; max is 1046937600 ** ** However, the heap is not full. The heap usage has a jagged pattern going from 60% up to 70% during 5 minutes and then back down to 60% the next 5 minutes and so on. I get no Heap is X full... messages. Every once in a while at one of these peaks, I get these stop-the-world GC for 6-7 minutes. Why does GC take up so much time even though the heap isn't full? ** ** I am aware that my access patterns make key caching very unlikely to be high. And indeed, my average key cache hit ratio during the run of the scripts is around 0.5%. I tried disabling key caching on the accessed column family (UPDATE COLUMN FAMILY cf WITH caching=none;) through the cassandra-cli but I get the same behaviour. Is the turning key cache off effective immediately? ** ** Stop-the-world GC is fine if it happens for a few seconds but having them for several minutes doesn't work. Any other suggestions to remove them?*** * ** ** Best regards, Joel Samuelsson signature-logo402b.pngsignature-best-employer-logo72cd.png
Re: Clearing tombstones
Yeah, I didn't mean normal as in what most people use. I meant that they are not strange like Tyler mentions. 2013/3/28 aaron morton aa...@thelastpickle.com The cleanup operation took several minutes though. This doesn't seem normal then It read all the data and made sure the node was a replica for it. Since a single node cluster replicas all data, there was not a lot to throw away. My replication settings should be very normal (simple strategy and replication factor 1). Most people use the Network Topology Strategy and RF 3, even if they dont have multiple DC's. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 28/03/2013, at 3:34 AM, Joel Samuelsson samuelsson.j...@gmail.com wrote: I see. The cleanup operation took several minutes though. This doesn't seem normal then? My replication settings should be very normal (simple strategy and replication factor 1). 2013/3/26 Tyler Hobbs ty...@datastax.com On Tue, Mar 26, 2013 at 5:39 AM, Joel Samuelsson samuelsson.j...@gmail.com wrote: Sorry. I failed to mention that all my CFs had a gc_grace_seconds of 0 since it's a 1 node cluster. I managed to accomplish what I wanted by first running cleanup and then compact. Is there any logic to this or should my tombstones be cleared by just running compact? There's nothing for cleanup to do on a single node cluster (unless you've changed your replication settings in a strange way, like setting no replicas for a keyspace). Just doing a major compaction will take care of tombstones that are gc_grace_seconds old. -- Tyler Hobbs DataStax http://datastax.com/
Re: Clearing tombstones
I see. The cleanup operation took several minutes though. This doesn't seem normal then? My replication settings should be very normal (simple strategy and replication factor 1). 2013/3/26 Tyler Hobbs ty...@datastax.com On Tue, Mar 26, 2013 at 5:39 AM, Joel Samuelsson samuelsson.j...@gmail.com wrote: Sorry. I failed to mention that all my CFs had a gc_grace_seconds of 0 since it's a 1 node cluster. I managed to accomplish what I wanted by first running cleanup and then compact. Is there any logic to this or should my tombstones be cleared by just running compact? There's nothing for cleanup to do on a single node cluster (unless you've changed your replication settings in a strange way, like setting no replicas for a keyspace). Just doing a major compaction will take care of tombstones that are gc_grace_seconds old. -- Tyler Hobbs DataStax http://datastax.com/
Re: Clearing tombstones
Sorry. I failed to mention that all my CFs had a gc_grace_seconds of 0 since it's a 1 node cluster. I managed to accomplish what I wanted by first running cleanup and then compact. Is there any logic to this or should my tombstones be cleared by just running compact? 2013/3/25 Tyler Hobbs ty...@datastax.com You'll need to temporarily lower gc_grace_seconds for that column family, run compaction, and then restore gc_grace_seconds to its original value. See http://wiki.apache.org/cassandra/DistributedDeletes for more info. On Mon, Mar 25, 2013 at 7:40 AM, Joel Samuelsson samuelsson.j...@gmail.com wrote: Hi, I've deleted a range of keys in my one node test-cluster and want to re-add them with an older creation time. How can I make sure all tombstones are gone so that they can be re-added properly? I've tried nodetool compact but it seems some tombstones remain. Best regards, Joel Samuelsson -- Tyler Hobbs DataStax http://datastax.com/
Clearing tombstones
Hi, I've deleted a range of keys in my one node test-cluster and want to re-add them with an older creation time. How can I make sure all tombstones are gone so that they can be re-added properly? I've tried nodetool compact but it seems some tombstones remain. Best regards, Joel Samuelsson
Re: Cassandra freezes
Thanks for the GC suggestion. It seems we didn't have enough CPU power to handle both the data and GC. Increasing the number of CPU cores made everything run smoothly at the same load. 2013/3/21 Andras Szerdahelyi andras.szerdahe...@ignitionone.com Neat! Thanks. From: Sylvain Lebresne sylv...@datastax.com Reply-To: user@cassandra.apache.org user@cassandra.apache.org Date: Thursday 21 March 2013 10:10 To: user@cassandra.apache.org user@cassandra.apache.org Subject: Re: Cassandra freezes Prior to 1.2 the index summaries were not saved on disk, and were thus computed on startup while the sstable was loaded. In 1.2 they now are saved on disk to make startup faster ( https://issues.apache.org/jira/browse/CASSANDRA-2392). That being said, if the index_interval value used by a summary saved doesn't match the current one while the sstable is loaded, the summary is recomputed anyway, so restarting a node should always take a new index_interval setting into account. -- Sylvain On Thu, Mar 21, 2013 at 9:43 AM, Andras Szerdahelyi andras.szerdahe...@ignitionone.com wrote: I can not find the reference that notes having to upgradesstables when you change this. I really hope such complex assumptions are not formulating in my head just on their own and there actually exists some kind of reliable reference that clears this up :-) but, # index_interval controls the sampling of entries from the primrary # row index in terms of space versus time. The larger the interval, # the smaller and less effective the sampling will be. In technicial # terms, the interval coresponds to the number of index entries that # are skipped between taking each sample. All the sampled entries # must fit in memory. Generally, a value between 128 and 512 here # coupled with a large key cache size on CFs results in the best trade # offs. This value is not often changed, however if you have many # very small rows (many to an OS page), then increasing this will # often lower memory usage without a impact on performance. it is ( very ) safe to assume the row index is re-built/updated when new sstables are built. Obviously the sample of this index will have to follow this process very closely. It is possible however that the sample itself is not persisted and is built at startup as opposed to *only* when the index changes.( which is what I thought was happening ) It shouldn't be too difficult to verify this, but I'd appreciate if someone who looked at this before could confirm if this is the case. Thanks, Andras On 21/03/13 09:13, Michal Michalski mich...@opera.com wrote: About index_interval: 1) you have to rebuild stables ( not an issue if you are evaluating, doing test writes.. Etc, not so much in production ) Are you sure of this? As I understand indexes, it's not required because this parameter defines an interval of in-memory index sample, which is created during C* startup basing on a primary on-disk index file. The fact that Heap usage is reduced immediately after C* restart seem to confirm this, but maybe I miss something? M.