what does nodetool compact command do for leveled compactions?

2013-10-24 Thread rash aroskar
Hi,
I have a column family created with strategy of leveled compaction. If I
execute nodetool compact command, will the columnfamily be compacted using
size tiered compaction strategy?
If yes, after the major size tiered compaction finishes will it at any
point trigger leveled compaction back on the column family?


Thanks,
Rashmi


1.2 leveled compactions can affect big bunch of writes? how to stop/restart them?

2013-09-19 Thread rash aroskar
Hi,
In general leveled compaction are I/O heavy so when there are bunch of
writes do we need to stop leveled compactions at all?
I found the nodetool stop COMPACTION, which states it stops compaction
happening, does this work for any type of compaction? Also it states in
documents 'eventually cassandra restarts the compaction', isn't there a way
to control when to start the compaction again manually ?
If this is not applicable for leveled compactions in 1.2, then what can be
used for stopping/restating those?



Thanks,
Rashmi


Re: 1.2 leveled compactions can affect big bunch of writes? how to stop/restart them?

2013-09-19 Thread rash aroskar
Thanks for responses.
Nate - I haven't tried changing compaction_throughput_mb_per_sec. In my
cassandra.yaml I had set it to 32 to begin with. Do you think 32 can be too
much if the cassandra get once in a while writes but when it gets writes
its a big chunk together?


On Thu, Sep 19, 2013 at 12:33 PM, sankalp kohli kohlisank...@gmail.comwrote:

 You cannot start level compaction. It will run based on data in each
 level.


 On Thu, Sep 19, 2013 at 9:19 AM, Nate McCall n...@thelastpickle.comwrote:

 As opposed to stopping compaction altogether, have you experimented with
 turning down compaction_throughput_mb_per_sec (16mb default) and/or
 explicitly setting concurrent_compactors (defaults to the number of cores,
 iirc).


 On Thu, Sep 19, 2013 at 10:58 AM, rash aroskar 
 rashmi.aros...@gmail.comwrote:

 Hi,
 In general leveled compaction are I/O heavy so when there are bunch of
 writes do we need to stop leveled compactions at all?
 I found the nodetool stop COMPACTION, which states it stops compaction
 happening, does this work for any type of compaction? Also it states in
 documents 'eventually cassandra restarts the compaction', isn't there a way
 to control when to start the compaction again manually ?
 If this is not applicable for leveled compactions in 1.2, then what can
 be used for stopping/restating those?



 Thanks,
 Rashmi






making sure 1 copy per availability zone(rack) using EC2Snitch

2013-09-09 Thread rash aroskar
Hello,
I am planning my new cassandra 1.2.5 cluster with all nodes in single
region but divided among 2 availablity zones equally. I want to make sure
with replication factor 2 I get 1 copy in every availability zone. As per
my knowledge using placement strategy EC2Snitch should take care of this.
Is this correct? Do I need to specify any strategy options?

Thanks.
Rashmi


Re: making sure 1 copy per availability zone(rack) using EC2Snitch

2013-09-09 Thread rash aroskar
Thanks for quick response Rob.
Are you suggesting deploying 1.2.9 only if using Cassandra DC outside of
EC2 or if I wish to use rack replication at all?


On Mon, Sep 9, 2013 at 12:43 PM, Robert Coli rc...@eventbrite.com wrote:

 On Mon, Sep 9, 2013 at 8:56 AM, rash aroskar rashmi.aros...@gmail.comwrote:

 I am planning my new cassandra 1.2.5 cluster with all nodes in single
 region but divided among 2 availablity zones equally. I want to make sure
 with replication factor 2 I get 1 copy in every availability zone. As per
 my knowledge using placement strategy EC2Snitch should take care of this.
 Is this correct? Do I need to specify any strategy options?


 https://issues.apache.org/jira/browse/CASSANDRA-3810

 Has background on how one must configure all rack aware snitches.

 If you may ever have a Cassandra DC outside of EC2, for example for
 Disaster Recovery, use GossipingPropertyFileSnitch.

 Also, deploy on 1.2.9, not 1.2.5.

 =Rob



aws VPC for cassandra

2013-08-22 Thread rash aroskar
Hello,
Has anyone used aws VPC for cassandra cluster? The static private ips of
VPC must be helpful in case of node replacement.
Please share any experiences related or suggest ideas for static ips in ec2
for cassandra.

-Rashmi


Re: Vnodes, adding a node ?

2013-08-15 Thread rash aroskar
what do you mean by not adding as a seed? if I add new node to existing
cluster, the new node should not be added as a seed in cassandra.yaml for
other nodes in the ring?
when should it be added as a seed then? once the cluster is balanced? or
after manually running rebuild command?



On Wed, Aug 14, 2013 at 3:34 PM, Andrew Cobley a.e.cob...@dundee.ac.ukwrote:

  That looks like the problem.  I added the node  with that machine as a
 seed, realized my mistake and restarted the machine with the correct seed.
 it joined the ring but without streaming.  Nodetool rebuild however doesn't
 seem to be fixing the situation.

 I'll remove the node and try re-adding it after cleaning
 /var/lib/cassandra.

 Andy



  --
 *From:* Richard Low [rich...@wentnet.com]
 *Sent:* 14 August 2013 20:11
 *To:* user@cassandra.apache.org
 *Subject:* Re: Vnodes, adding a node ?

   On 14 August 2013 20:02, Andrew Cobley a.e.cob...@dundee.ac.uk wrote:

 I have  small test cluster of 2 nodes.  I ran a stress test on it and
 with nodetool status received the following:

 /usr/local/bin/apache-cassandra-2.0.0-rc1/log $ ../bin/nodetool status
 Datacenter: datacenter1
 ===
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving
 --  Address   Load   Tokens  Owns (effective)  Host ID
 Rack
 UN  192.168.0.11  141.13 MB  256 49.2%
 4d281e2e-efd9-4abf-bb70-ebdf8e2b4fc3  rack1
 UN  192.168.0.10  145.59 MB  256 50.8%
 7fc5795a-bd1b-4e42-88d6-024c5216a893  rack1

 I then added a third node with no machines writing to the system.  Using
 nodetool status I got the following:

 /usr/local/bin/apache-cassandra-2.0.0-rc1/log $ ../bin/nodetool status
 Datacenter: datacenter1
 ===
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving
 --  Address   Load   Tokens  Owns (effective)  Host ID
 Rack
 UN  192.168.0.11  141.12 MB  256 32.2%
 4d281e2e-efd9-4abf-bb70-ebdf8e2b4fc3  rack1
 UN  192.168.0.10  145.59 MB  256 35.3%
 7fc5795a-bd1b-4e42-88d6-024c5216a893  rack1
 UN  192.168.0.12  111.9 KB   256 32.5%
 e5e6d8bd-c652-4c18-8fa3-3d71471eee65  rack1

 Is this correct ?  I was under the impression that adding a node to an
 existing cluster would distribute the load around the cluster. Am I perhaps
 missing a step or have a config error perhaps ?


  How did you add the node?  It looks like it didn't bootstrap but just
 joined the ring.  You need to make sure the node is not set as a seed and
 that auto_bootstrap is true (the default).

  Alternatively, you could run 'nodetool rebuild' to stream data from the
 other nodes.

  Richard.

 The University of Dundee is a registered Scottish Charity, No: SC015096



cassandra snapshot 1.2.5 , stores ttl?

2013-08-15 Thread rash aroskar
Hi,
If I add some data in cassandra cluster with TTL lets say 2 days, took
snapshot of it before it expires. If I use the snapshot to load the data in
different/same cluster, will the data from the snapshot will carry the TTL
of 2 days (from the time when the snapshot was created)? if not can I
specify TTL when getting data from the snapshot?
If I use Priam will it make this process any easier?

Thanks.
-Rashmi


Re: cassandra snapshot 1.2.5 , stores ttl?

2013-08-15 Thread rash aroskar
got it. Thanks for response.



On Thu, Aug 15, 2013 at 4:26 PM, Tyler Hobbs ty...@datastax.com wrote:

 Snapshots just hardlink the existing SSTable files.  They don't freeze
 the TTL or anything like that, so data can expire while snapshotted and it
 will be converted to a tombstone when it's first compacted after you reload
 it.  There's not an easy way to prevent this from happening.


 On Thu, Aug 15, 2013 at 1:13 PM, rash aroskar rashmi.aros...@gmail.comwrote:

 Hi,
 If I add some data in cassandra cluster with TTL lets say 2 days, took
 snapshot of it before it expires. If I use the snapshot to load the data in
 different/same cluster, will the data from the snapshot will carry the TTL
 of 2 days (from the time when the snapshot was created)? if not can I
 specify TTL when getting data from the snapshot?
 If I use Priam will it make this process any easier?

 Thanks.
 -Rashmi




 --
 Tyler Hobbs
 DataStax http://datastax.com/



Re: cassandra 1.2.5- virtual nodes (num_token) pros/cons?

2013-08-09 Thread rash aroskar
Aaron - I read about the virtual nodes at
http://www.datastax.com/dev/blog/virtual-nodes-in-cassandra-1-2








On Tue, Aug 6, 2013 at 4:49 AM, Richard Low rich...@wentnet.com wrote:

 On 6 August 2013 08:40, Aaron Morton aa...@thelastpickle.com wrote:

 The reason for me looking at virtual nodes is because of terrible
 experiences we had with 0.8 repairs and as per documentation (an logically)
 the virtual nodes seems like it will help repairs being smoother. Is this
 true?

 I've not thought too much about how they help repair run smoother, what
 was the documentation you read ?


 There might be a slight improvement but I haven't observed any.  The
 difference might be that, because every node shares replicas with every
 other (with high probability), a single repair operation does the same work
 on the node it was called on, but the rest is spread out over the cluster,
 rather than just the RF nodes either side of the repairing node.  This
 means the post-repair compaction work will take less time and the length of
 time a node is loaded for during repair is less.

 However, the other benefits of vnodes are likely to be much more useful.

 Richard.



Cassandra 1.2.5 which compressor is better?

2013-08-09 Thread rash aroskar
Hi,
I am setting up new cluster for cassandra 1.2.5, and first time using
cassandra compression.
I read about the compressors, and it gathered Snappy Compressor gives
better compression but is sslightly slower than LZ4 compressor. Just wanted
to know your experience and/or opinions as to *Snappy vs LZ4 , which
compressor is better in case of huge data, less writes but lots of reads.*


Thanks.
-Rashmi


Re: cassandra 1.2.5- virtual nodes (num_token) pros/cons?

2013-08-02 Thread rash aroskar
Thanks for helpful responses. The upgrade from 0.8 to 1.2 is not direct, we
have setup test cluster where we did upgrade from 0.8 to 1.1 and then 1.2.
Also we will do a whole different cluster with 1.2, the 0.8 cluster will
not be upgraded. But the data will be moved from 0.8 cluster to 1.2
cluster.
The reason for me looking at virtual nodes is because of terrible
experiences we had with 0.8 repairs and as per documentation (an logically)
the virtual nodes seems like it will help repairs being smoother. Is this
true? Also how to get the right number of virtual nodes? David suggested 64
vnodes for 20 machines. Is there a formula or a thought process to be
followed to get this number right?


On Mon, Jul 29, 2013 at 4:15 AM, aaron morton aa...@thelastpickle.comwrote:

 I would *strongly* recommend against upgrading from 0.8 directly to 1.2.
 Skipping a major version is generally not recommended, skipped 3 would seem
 like carelessness.

 I second Romain, do the upgrade and make sure the health is good first.

 +1 but I would also recommend deciding if you actually need to use virtual
 nodes. The shuffle process can take a long time and people have had mixed
 experiences with it.

 If you wanted to move to 1.2 and get vNodes I would consider spinning up a
 new cluster and bulk loading into it. You could do an initial load and then
 to delta loads using snapshots, there would however be a period of stale
 data in the new cluster until the last delta snapshot is loaded.

 Cheers

 -
 Aaron Morton
 Cassandra Consultant
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 27/07/2013, at 3:36 AM, David McNelis dmcne...@gmail.com wrote:

 I second Romain, do the upgrade and make sure the health is good first.

 If you have or plan to have a large number of nodes, you might consider
 using fewer than 256 as your initial vnodes amount.  I think that number is
 inflated from reasonable in the docs, as we've had some people talk about
 potential performance degradation if you have a large number of nodes and a
 very high number of vnodes, if I had it to do over again, I'd have done 64
 vnodes as my default (across 20 nodes).

 Another thing to be very cognizant of before shuffle is disk space.  You
 *must* have less than 50% used in order to do the shuffle successfully
 because no data is removed (cleaned) from a node during the shuffle process
 and the shuffle process essentially doubles the amount of data until you're
 able to run a clean.


 On Fri, Jul 26, 2013 at 11:25 AM, Romain HARDOUIN 
 romain.hardo...@urssaf.fr wrote:

 Vnodes are a great feature. More nodes are involved during operations
 such as bootstrap, decommission, etc.
 DataStax documentation is definitely a must read.
 That said, If I were you, I'd wait somewhat before to shuffle the ring.
 I'd focus on cluster upgrade and monitoring the nodes. (number of files
 handles, memory usage, latency, etc).
 Upgrading from 0.8 to 1.2 can be tricky, there are so many changes since
 then. Be careful about compaction strategies you choose and double check
 the options.

 Regards,
 Romain

 rash aroskar rashmi.aros...@gmail.com a écrit sur 25/07/2013 23:25:11 :

  De : rash aroskar rashmi.aros...@gmail.com
  A : user@cassandra.apache.org,
  Date : 25/07/2013 23:25
  Objet : cassandra 1.2.5- virtual nodes (num_token) pros/cons?
 
  Hi,
  I am upgrading my cassandra cluster from 0.8 to 1.2.5.
  In cassandra 1.2.5 the 'num_token' attribute confuses me.
  I understand that it distributes multiple tokens per node but I am
  not clear how that is helpful for performance or load balancing. Can
  anyone elaborate? has anyone used this feature  and knows its
  advantages/disadvantages?
 
  Thanks,
  Rashmi






Re: How often to run `nodetool repair`

2013-08-01 Thread rash aroskar
We observed the same behavior. During last repair the data distribution on
nodes was imbalanced as well resulting in one node bloating.
On Aug 1, 2013 12:36 PM, Carl Lerche m...@carllerche.com wrote:

 Hello,

 I read in the docs that `nodetool repair` should be regularly run unless
 no delete is ever performed. In my app, I never delete, but I heavily use
 the ttl feature. Should repair still be run regularly? Also, does repair
 take less time if it is run regularly? If not, is there a way to
 incrementally run it? It seems that when I do run repair, it takes a long
 time and causes high amounts CPU usage and iowait.

 Thoughts?

 Thanks,
 Carl



cassandra 1.2.5- virtual nodes (num_token) pros/cons?

2013-07-25 Thread rash aroskar
Hi,
I am upgrading my cassandra cluster from 0.8 to 1.2.5.
In cassandra 1.2.5 the 'num_token' attribute confuses me.
I understand that it distributes multiple tokens per node but I am not
clear how that is helpful for performance or load balancing. Can anyone
elaborate? has anyone used this feature  and knows its
advantages/disadvantages?

Thanks,
Rashmi