what does nodetool compact command do for leveled compactions?
Hi, I have a column family created with strategy of leveled compaction. If I execute nodetool compact command, will the columnfamily be compacted using size tiered compaction strategy? If yes, after the major size tiered compaction finishes will it at any point trigger leveled compaction back on the column family? Thanks, Rashmi
1.2 leveled compactions can affect big bunch of writes? how to stop/restart them?
Hi, In general leveled compaction are I/O heavy so when there are bunch of writes do we need to stop leveled compactions at all? I found the nodetool stop COMPACTION, which states it stops compaction happening, does this work for any type of compaction? Also it states in documents 'eventually cassandra restarts the compaction', isn't there a way to control when to start the compaction again manually ? If this is not applicable for leveled compactions in 1.2, then what can be used for stopping/restating those? Thanks, Rashmi
Re: 1.2 leveled compactions can affect big bunch of writes? how to stop/restart them?
Thanks for responses. Nate - I haven't tried changing compaction_throughput_mb_per_sec. In my cassandra.yaml I had set it to 32 to begin with. Do you think 32 can be too much if the cassandra get once in a while writes but when it gets writes its a big chunk together? On Thu, Sep 19, 2013 at 12:33 PM, sankalp kohli kohlisank...@gmail.comwrote: You cannot start level compaction. It will run based on data in each level. On Thu, Sep 19, 2013 at 9:19 AM, Nate McCall n...@thelastpickle.comwrote: As opposed to stopping compaction altogether, have you experimented with turning down compaction_throughput_mb_per_sec (16mb default) and/or explicitly setting concurrent_compactors (defaults to the number of cores, iirc). On Thu, Sep 19, 2013 at 10:58 AM, rash aroskar rashmi.aros...@gmail.comwrote: Hi, In general leveled compaction are I/O heavy so when there are bunch of writes do we need to stop leveled compactions at all? I found the nodetool stop COMPACTION, which states it stops compaction happening, does this work for any type of compaction? Also it states in documents 'eventually cassandra restarts the compaction', isn't there a way to control when to start the compaction again manually ? If this is not applicable for leveled compactions in 1.2, then what can be used for stopping/restating those? Thanks, Rashmi
making sure 1 copy per availability zone(rack) using EC2Snitch
Hello, I am planning my new cassandra 1.2.5 cluster with all nodes in single region but divided among 2 availablity zones equally. I want to make sure with replication factor 2 I get 1 copy in every availability zone. As per my knowledge using placement strategy EC2Snitch should take care of this. Is this correct? Do I need to specify any strategy options? Thanks. Rashmi
Re: making sure 1 copy per availability zone(rack) using EC2Snitch
Thanks for quick response Rob. Are you suggesting deploying 1.2.9 only if using Cassandra DC outside of EC2 or if I wish to use rack replication at all? On Mon, Sep 9, 2013 at 12:43 PM, Robert Coli rc...@eventbrite.com wrote: On Mon, Sep 9, 2013 at 8:56 AM, rash aroskar rashmi.aros...@gmail.comwrote: I am planning my new cassandra 1.2.5 cluster with all nodes in single region but divided among 2 availablity zones equally. I want to make sure with replication factor 2 I get 1 copy in every availability zone. As per my knowledge using placement strategy EC2Snitch should take care of this. Is this correct? Do I need to specify any strategy options? https://issues.apache.org/jira/browse/CASSANDRA-3810 Has background on how one must configure all rack aware snitches. If you may ever have a Cassandra DC outside of EC2, for example for Disaster Recovery, use GossipingPropertyFileSnitch. Also, deploy on 1.2.9, not 1.2.5. =Rob
aws VPC for cassandra
Hello, Has anyone used aws VPC for cassandra cluster? The static private ips of VPC must be helpful in case of node replacement. Please share any experiences related or suggest ideas for static ips in ec2 for cassandra. -Rashmi
Re: Vnodes, adding a node ?
what do you mean by not adding as a seed? if I add new node to existing cluster, the new node should not be added as a seed in cassandra.yaml for other nodes in the ring? when should it be added as a seed then? once the cluster is balanced? or after manually running rebuild command? On Wed, Aug 14, 2013 at 3:34 PM, Andrew Cobley a.e.cob...@dundee.ac.ukwrote: That looks like the problem. I added the node with that machine as a seed, realized my mistake and restarted the machine with the correct seed. it joined the ring but without streaming. Nodetool rebuild however doesn't seem to be fixing the situation. I'll remove the node and try re-adding it after cleaning /var/lib/cassandra. Andy -- *From:* Richard Low [rich...@wentnet.com] *Sent:* 14 August 2013 20:11 *To:* user@cassandra.apache.org *Subject:* Re: Vnodes, adding a node ? On 14 August 2013 20:02, Andrew Cobley a.e.cob...@dundee.ac.uk wrote: I have small test cluster of 2 nodes. I ran a stress test on it and with nodetool status received the following: /usr/local/bin/apache-cassandra-2.0.0-rc1/log $ ../bin/nodetool status Datacenter: datacenter1 === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN 192.168.0.11 141.13 MB 256 49.2% 4d281e2e-efd9-4abf-bb70-ebdf8e2b4fc3 rack1 UN 192.168.0.10 145.59 MB 256 50.8% 7fc5795a-bd1b-4e42-88d6-024c5216a893 rack1 I then added a third node with no machines writing to the system. Using nodetool status I got the following: /usr/local/bin/apache-cassandra-2.0.0-rc1/log $ ../bin/nodetool status Datacenter: datacenter1 === Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN 192.168.0.11 141.12 MB 256 32.2% 4d281e2e-efd9-4abf-bb70-ebdf8e2b4fc3 rack1 UN 192.168.0.10 145.59 MB 256 35.3% 7fc5795a-bd1b-4e42-88d6-024c5216a893 rack1 UN 192.168.0.12 111.9 KB 256 32.5% e5e6d8bd-c652-4c18-8fa3-3d71471eee65 rack1 Is this correct ? I was under the impression that adding a node to an existing cluster would distribute the load around the cluster. Am I perhaps missing a step or have a config error perhaps ? How did you add the node? It looks like it didn't bootstrap but just joined the ring. You need to make sure the node is not set as a seed and that auto_bootstrap is true (the default). Alternatively, you could run 'nodetool rebuild' to stream data from the other nodes. Richard. The University of Dundee is a registered Scottish Charity, No: SC015096
cassandra snapshot 1.2.5 , stores ttl?
Hi, If I add some data in cassandra cluster with TTL lets say 2 days, took snapshot of it before it expires. If I use the snapshot to load the data in different/same cluster, will the data from the snapshot will carry the TTL of 2 days (from the time when the snapshot was created)? if not can I specify TTL when getting data from the snapshot? If I use Priam will it make this process any easier? Thanks. -Rashmi
Re: cassandra snapshot 1.2.5 , stores ttl?
got it. Thanks for response. On Thu, Aug 15, 2013 at 4:26 PM, Tyler Hobbs ty...@datastax.com wrote: Snapshots just hardlink the existing SSTable files. They don't freeze the TTL or anything like that, so data can expire while snapshotted and it will be converted to a tombstone when it's first compacted after you reload it. There's not an easy way to prevent this from happening. On Thu, Aug 15, 2013 at 1:13 PM, rash aroskar rashmi.aros...@gmail.comwrote: Hi, If I add some data in cassandra cluster with TTL lets say 2 days, took snapshot of it before it expires. If I use the snapshot to load the data in different/same cluster, will the data from the snapshot will carry the TTL of 2 days (from the time when the snapshot was created)? if not can I specify TTL when getting data from the snapshot? If I use Priam will it make this process any easier? Thanks. -Rashmi -- Tyler Hobbs DataStax http://datastax.com/
Re: cassandra 1.2.5- virtual nodes (num_token) pros/cons?
Aaron - I read about the virtual nodes at http://www.datastax.com/dev/blog/virtual-nodes-in-cassandra-1-2 On Tue, Aug 6, 2013 at 4:49 AM, Richard Low rich...@wentnet.com wrote: On 6 August 2013 08:40, Aaron Morton aa...@thelastpickle.com wrote: The reason for me looking at virtual nodes is because of terrible experiences we had with 0.8 repairs and as per documentation (an logically) the virtual nodes seems like it will help repairs being smoother. Is this true? I've not thought too much about how they help repair run smoother, what was the documentation you read ? There might be a slight improvement but I haven't observed any. The difference might be that, because every node shares replicas with every other (with high probability), a single repair operation does the same work on the node it was called on, but the rest is spread out over the cluster, rather than just the RF nodes either side of the repairing node. This means the post-repair compaction work will take less time and the length of time a node is loaded for during repair is less. However, the other benefits of vnodes are likely to be much more useful. Richard.
Cassandra 1.2.5 which compressor is better?
Hi, I am setting up new cluster for cassandra 1.2.5, and first time using cassandra compression. I read about the compressors, and it gathered Snappy Compressor gives better compression but is sslightly slower than LZ4 compressor. Just wanted to know your experience and/or opinions as to *Snappy vs LZ4 , which compressor is better in case of huge data, less writes but lots of reads.* Thanks. -Rashmi
Re: cassandra 1.2.5- virtual nodes (num_token) pros/cons?
Thanks for helpful responses. The upgrade from 0.8 to 1.2 is not direct, we have setup test cluster where we did upgrade from 0.8 to 1.1 and then 1.2. Also we will do a whole different cluster with 1.2, the 0.8 cluster will not be upgraded. But the data will be moved from 0.8 cluster to 1.2 cluster. The reason for me looking at virtual nodes is because of terrible experiences we had with 0.8 repairs and as per documentation (an logically) the virtual nodes seems like it will help repairs being smoother. Is this true? Also how to get the right number of virtual nodes? David suggested 64 vnodes for 20 machines. Is there a formula or a thought process to be followed to get this number right? On Mon, Jul 29, 2013 at 4:15 AM, aaron morton aa...@thelastpickle.comwrote: I would *strongly* recommend against upgrading from 0.8 directly to 1.2. Skipping a major version is generally not recommended, skipped 3 would seem like carelessness. I second Romain, do the upgrade and make sure the health is good first. +1 but I would also recommend deciding if you actually need to use virtual nodes. The shuffle process can take a long time and people have had mixed experiences with it. If you wanted to move to 1.2 and get vNodes I would consider spinning up a new cluster and bulk loading into it. You could do an initial load and then to delta loads using snapshots, there would however be a period of stale data in the new cluster until the last delta snapshot is loaded. Cheers - Aaron Morton Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 27/07/2013, at 3:36 AM, David McNelis dmcne...@gmail.com wrote: I second Romain, do the upgrade and make sure the health is good first. If you have or plan to have a large number of nodes, you might consider using fewer than 256 as your initial vnodes amount. I think that number is inflated from reasonable in the docs, as we've had some people talk about potential performance degradation if you have a large number of nodes and a very high number of vnodes, if I had it to do over again, I'd have done 64 vnodes as my default (across 20 nodes). Another thing to be very cognizant of before shuffle is disk space. You *must* have less than 50% used in order to do the shuffle successfully because no data is removed (cleaned) from a node during the shuffle process and the shuffle process essentially doubles the amount of data until you're able to run a clean. On Fri, Jul 26, 2013 at 11:25 AM, Romain HARDOUIN romain.hardo...@urssaf.fr wrote: Vnodes are a great feature. More nodes are involved during operations such as bootstrap, decommission, etc. DataStax documentation is definitely a must read. That said, If I were you, I'd wait somewhat before to shuffle the ring. I'd focus on cluster upgrade and monitoring the nodes. (number of files handles, memory usage, latency, etc). Upgrading from 0.8 to 1.2 can be tricky, there are so many changes since then. Be careful about compaction strategies you choose and double check the options. Regards, Romain rash aroskar rashmi.aros...@gmail.com a écrit sur 25/07/2013 23:25:11 : De : rash aroskar rashmi.aros...@gmail.com A : user@cassandra.apache.org, Date : 25/07/2013 23:25 Objet : cassandra 1.2.5- virtual nodes (num_token) pros/cons? Hi, I am upgrading my cassandra cluster from 0.8 to 1.2.5. In cassandra 1.2.5 the 'num_token' attribute confuses me. I understand that it distributes multiple tokens per node but I am not clear how that is helpful for performance or load balancing. Can anyone elaborate? has anyone used this feature and knows its advantages/disadvantages? Thanks, Rashmi
Re: How often to run `nodetool repair`
We observed the same behavior. During last repair the data distribution on nodes was imbalanced as well resulting in one node bloating. On Aug 1, 2013 12:36 PM, Carl Lerche m...@carllerche.com wrote: Hello, I read in the docs that `nodetool repair` should be regularly run unless no delete is ever performed. In my app, I never delete, but I heavily use the ttl feature. Should repair still be run regularly? Also, does repair take less time if it is run regularly? If not, is there a way to incrementally run it? It seems that when I do run repair, it takes a long time and causes high amounts CPU usage and iowait. Thoughts? Thanks, Carl
cassandra 1.2.5- virtual nodes (num_token) pros/cons?
Hi, I am upgrading my cassandra cluster from 0.8 to 1.2.5. In cassandra 1.2.5 the 'num_token' attribute confuses me. I understand that it distributes multiple tokens per node but I am not clear how that is helpful for performance or load balancing. Can anyone elaborate? has anyone used this feature and knows its advantages/disadvantages? Thanks, Rashmi