autoscaling cassandra cluster

2014-05-21 Thread Jabbar Azam
Hello,

Has anybody got a cassandra cluster which autoscales depending on load or
times of the day?

I've seen the documentation on the datastax website and that only mentioned
adding and removing nodes, unless I've missed something.

I want to know how to do this for the google compute engine. This isn't for
a production system but a test system(multiple nodes) where I want to
learn. I'm not sure how to check the performance of the cluster, whether I
use one performance metric or a mix of performance metrics and then invoke
a script to add or remove nodes from the cluster.

I'd be interested to know whether people out there are autoscaling
cassandra on demand.

Thanks

Jabbar Azam


Re: autoscaling cassandra cluster

2014-05-21 Thread Prem Yadav
Hi Jabbar,
with vnodes, scaling up should not be a problem. You could just add a
machines with the cluster/seed/datacenter conf and it should join the
cluster.
Scaling down has to be manual where you drain the node and decommission it.

thanks,
Prem



On Wed, May 21, 2014 at 12:35 PM, Jabbar Azam aja...@gmail.com wrote:

 Hello,

 Has anybody got a cassandra cluster which autoscales depending on load or
 times of the day?

 I've seen the documentation on the datastax website and that only
 mentioned adding and removing nodes, unless I've missed something.

 I want to know how to do this for the google compute engine. This isn't
 for a production system but a test system(multiple nodes) where I want to
 learn. I'm not sure how to check the performance of the cluster, whether I
 use one performance metric or a mix of performance metrics and then invoke
 a script to add or remove nodes from the cluster.

 I'd be interested to know whether people out there are autoscaling
 cassandra on demand.

 Thanks

 Jabbar Azam



Re: CqlStorage can't perform INSERTs with Pig?

2014-05-21 Thread James Schappet
In CQL Updates and Inserts are the same thing.

You need to convert your insert statements to UPDATE

Here is a quick example loading from a JSON file, into two cassandra tables

Notice the the output query is URL Encoded.




a = load 'barcode_uuid_mapping_current.json'
using JsonLoader('uuidMapping:{(barcode:chararray,uuid:chararray)}');

result = foreach a GENERATE FLATTEN(uuidMapping); 


result = foreach a GENERATE FLATTEN(uuidMapping);

data_to_insert = FOREACH result GENERATE 
 TOTUPLE(
 TOTUPLE('barcode',barcode) 
 ),
  TOTUPLE( uuid ) ;
STORE data_to_insert INTO 
'cql://tcgadata/barcode_to_uuid?output_query=update%20barcode_to_uuid%20set%20uuid%20%3D%20%3F'
 USING CqlStorage();

data_to_insert = FOREACH result GENERATE 
 TOTUPLE(
 TOTUPLE('uuid',uuid) 
 ),
  TOTUPLE( barcode ) ;
STORE data_to_insert INTO 
'cql://tcgadata/uuid_to_barcode?output_query=update%20uuid_to_barcode%20set%20barcode%20%3D%20%3F'
 USING CqlStorage();


There are some other examples here:
http://www.datastax.com/dev/blog/cql3-table-support-in-hadoop-pig-and-hive

and

http://www.schappet.com/pig_cassandra_bulk_load/




On May 20, 2014, at 10:02 PM, Kevin Burton bur...@spinn3r.com wrote:

 It seems that CqlStorage can't perform INSERTs when using pig.  IS there a 
 reason for this?
 
 Here's the relevant code from 2.0.7:
 
 String cqlQuery = CqlConfigHelper.getOutputCql(conf).trim();
 if (cqlQuery.toLowerCase().startsWith(insert))
 throw new UnsupportedOperationException(INSERT with 
 CqlRecordWriter is not supported, please use UPDATE/DELETE statement);
 
 … It seems to me that a DELETE and UPDATE is significantly less important 
 than INSERT.
 
 My use case is that I'm using pig to build a custom secondary index, and then 
 loading it back into cassandra.
 
 -- 
 
 Founder/CEO Spinn3r.com
 Location: San Francisco, CA
 Skype: burtonator
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 
 War is peace. Freedom is slavery. Ignorance is strength. Corporations are 
 people.
 



Re: autoscaling cassandra cluster

2014-05-21 Thread Jabbar Azam
Hello Prem,

I'm trying to find out whether people are autoscaling up and down
automatically, not manually. I'm also interested in whether they are using
a cloud based solution and creating and destroying instances.

I've found the following regarding GCE
https://cloud.google.com/developers/articles/auto-scaling-on-the-google-cloud-platformand
how instances can be created and destroyed.

 I


Thanks

Jabbar Azam


On 21 May 2014 13:09, Prem Yadav ipremya...@gmail.com wrote:

 Hi Jabbar,
 with vnodes, scaling up should not be a problem. You could just add a
 machines with the cluster/seed/datacenter conf and it should join the
 cluster.
 Scaling down has to be manual where you drain the node and decommission it.

 thanks,
 Prem



 On Wed, May 21, 2014 at 12:35 PM, Jabbar Azam aja...@gmail.com wrote:

 Hello,

 Has anybody got a cassandra cluster which autoscales depending on load or
 times of the day?

 I've seen the documentation on the datastax website and that only
 mentioned adding and removing nodes, unless I've missed something.

 I want to know how to do this for the google compute engine. This isn't
 for a production system but a test system(multiple nodes) where I want to
 learn. I'm not sure how to check the performance of the cluster, whether I
 use one performance metric or a mix of performance metrics and then invoke
 a script to add or remove nodes from the cluster.

 I'd be interested to know whether people out there are autoscaling
 cassandra on demand.

 Thanks

 Jabbar Azam





Re: autoscaling cassandra cluster

2014-05-21 Thread Panagiotis Garefalakis
I agree with Prem, but recently a guy send this promising project called
Mesos in this list.
https://github.com/mesosphere/cassandra-mesos
One of its goals is to make scaling easier.
I don’t have any personal opinion yet but maybe you could give it a try.

Regards,
Panagiotis



On Wed, May 21, 2014 at 3:49 PM, Jabbar Azam aja...@gmail.com wrote:

 Hello Prem,

 I'm trying to find out whether people are autoscaling up and down
 automatically, not manually. I'm also interested in whether they are using
 a cloud based solution and creating and destroying instances.

 I've found the following regarding GCE
 https://cloud.google.com/developers/articles/auto-scaling-on-the-google-cloud-platformand
  how instances can be created and destroyed.

  I


 Thanks

 Jabbar Azam


 On 21 May 2014 13:09, Prem Yadav ipremya...@gmail.com wrote:

 Hi Jabbar,
 with vnodes, scaling up should not be a problem. You could just add a
 machines with the cluster/seed/datacenter conf and it should join the
 cluster.
 Scaling down has to be manual where you drain the node and decommission
 it.

 thanks,
 Prem



 On Wed, May 21, 2014 at 12:35 PM, Jabbar Azam aja...@gmail.com wrote:

 Hello,

 Has anybody got a cassandra cluster which autoscales depending on load
 or times of the day?

 I've seen the documentation on the datastax website and that only
 mentioned adding and removing nodes, unless I've missed something.

 I want to know how to do this for the google compute engine. This isn't
 for a production system but a test system(multiple nodes) where I want to
 learn. I'm not sure how to check the performance of the cluster, whether I
 use one performance metric or a mix of performance metrics and then invoke
 a script to add or remove nodes from the cluster.

 I'd be interested to know whether people out there are autoscaling
 cassandra on demand.

 Thanks

 Jabbar Azam






Re: autoscaling cassandra cluster

2014-05-21 Thread Jabbar Azam
That sounds interesting.   I was thinking of using coreos with docker
containers for the business logic, frontend and Cassandra. I'll also have a
look at cassandra-mesos

Thanks

Jabbar Azam
On 21 May 2014 14:04, Panagiotis Garefalakis panga...@gmail.com wrote:

 I agree with Prem, but recently a guy send this promising project called
 Mesos in this list.
 https://github.com/mesosphere/cassandra-mesos
 One of its goals is to make scaling easier.
 I don’t have any personal opinion yet but maybe you could give it a try.

 Regards,
 Panagiotis



 On Wed, May 21, 2014 at 3:49 PM, Jabbar Azam aja...@gmail.com wrote:

 Hello Prem,

 I'm trying to find out whether people are autoscaling up and down
 automatically, not manually. I'm also interested in whether they are using
 a cloud based solution and creating and destroying instances.

 I've found the following regarding GCE
 https://cloud.google.com/developers/articles/auto-scaling-on-the-google-cloud-platformand
  how instances can be created and destroyed.

  I


 Thanks

 Jabbar Azam


 On 21 May 2014 13:09, Prem Yadav ipremya...@gmail.com wrote:

 Hi Jabbar,
 with vnodes, scaling up should not be a problem. You could just add a
 machines with the cluster/seed/datacenter conf and it should join the
 cluster.
 Scaling down has to be manual where you drain the node and decommission
 it.

 thanks,
 Prem



 On Wed, May 21, 2014 at 12:35 PM, Jabbar Azam aja...@gmail.com wrote:

 Hello,

 Has anybody got a cassandra cluster which autoscales depending on load
 or times of the day?

 I've seen the documentation on the datastax website and that only
 mentioned adding and removing nodes, unless I've missed something.

 I want to know how to do this for the google compute engine. This isn't
 for a production system but a test system(multiple nodes) where I want to
 learn. I'm not sure how to check the performance of the cluster, whether I
 use one performance metric or a mix of performance metrics and then invoke
 a script to add or remove nodes from the cluster.

 I'd be interested to know whether people out there are autoscaling
 cassandra on demand.

 Thanks

 Jabbar Azam







Re: autoscaling cassandra cluster

2014-05-21 Thread James Horey
If you're interested and/or need some Cassandra docker images let me know I'll 
shoot you a link.

James

Sent from my iPhone

 On May 21, 2014, at 10:19 AM, Jabbar Azam aja...@gmail.com wrote:
 
 That sounds interesting.   I was thinking of using coreos with docker 
 containers for the business logic, frontend and Cassandra. I'll also have a 
 look at cassandra-mesos
 
 Thanks
 
 Jabbar Azam
 
 On 21 May 2014 14:04, Panagiotis Garefalakis panga...@gmail.com wrote:
 I agree with Prem, but recently a guy send this promising project called 
 Mesos in this list. 
 https://github.com/mesosphere/cassandra-mesos
 One of its goals is to make scaling easier. 
 I don’t have any personal opinion yet but maybe you could give it a try.
 
 Regards,
 Panagiotis
 
 
 
 On Wed, May 21, 2014 at 3:49 PM, Jabbar Azam aja...@gmail.com wrote:
 Hello Prem,
 
 I'm trying to find out whether people are autoscaling up and down 
 automatically, not manually. I'm also interested in whether they are using 
 a cloud based solution and creating and destroying instances. 
 
 I've found the following regarding GCE 
 https://cloud.google.com/developers/articles/auto-scaling-on-the-google-cloud-platform
  and how instances can be created and destroyed. 
 
  I
 
 
 Thanks
 
 Jabbar Azam
 
 
 On 21 May 2014 13:09, Prem Yadav ipremya...@gmail.com wrote:
 Hi Jabbar,
 with vnodes, scaling up should not be a problem. You could just add a 
 machines with the cluster/seed/datacenter conf and it should join the 
 cluster.
 Scaling down has to be manual where you drain the node and decommission it.
 
 thanks,
 Prem
 
 
 
 On Wed, May 21, 2014 at 12:35 PM, Jabbar Azam aja...@gmail.com wrote:
 Hello,
 
 Has anybody got a cassandra cluster which autoscales depending on load or 
 times of the day?
 
 I've seen the documentation on the datastax website and that only 
 mentioned adding and removing nodes, unless I've missed something.
 
 I want to know how to do this for the google compute engine. This isn't 
 for a production system but a test system(multiple nodes) where I want to 
 learn. I'm not sure how to check the performance of the cluster, whether 
 I use one performance metric or a mix of performance metrics and then 
 invoke a script to add or remove nodes from the cluster.
 
 I'd be interested to know whether people out there are autoscaling 
 cassandra on demand.
 
 Thanks
 
 Jabbar Azam


Re: Can SSTables overlap with SizeTieredCompactionStrategy?

2014-05-21 Thread Phil Luckhurst
I'm wondering if the lack of response to this means it was a dumb question
however I've searched the documentation again but I still can't find an
answer :-(

Phil



--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Can-SSTables-overlap-with-SizeTieredCompactionStrategy-tp7594574p7594627.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Data locality with cash

2014-05-21 Thread Jens Rantil
Hi,

I've had a look at the Hive plugin for Cassandra[1]. Does anyone know if it
supports data locality if I install task trackers and job trackers on my
Cassandra instances?

[1] https://github.com/tuplejump/cash

Thanks,
Jens


Re: Can SSTables overlap with SizeTieredCompactionStrategy?

2014-05-21 Thread Prem Yadav
I would think its because of the index and filter files. Also the
additional data which gets added because of serialization. Also, since
SStables are only deleted after the compaction us finished, it might be
possible that when you checked, the intermediate SSTables were not yet
deleted.

However, 50% additional disk usage does sound bad.


On Wed, May 21, 2014 at 4:42 PM, Phil Luckhurst 
phil.luckhu...@powerassure.com wrote:

 I'm wondering if the lack of response to this means it was a dumb question
 however I've searched the documentation again but I still can't find an
 answer :-(

 Phil



 --
 View this message in context:
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Can-SSTables-overlap-with-SizeTieredCompactionStrategy-tp7594574p7594627.html
 Sent from the cassandra-u...@incubator.apache.org mailing list archive at
 Nabble.com.



RE: Can SSTables overlap with SizeTieredCompactionStrategy?

2014-05-21 Thread Andreas Finke
Hi Phil,

there is no dump question ;) What is your size estimation based on e.g. what 
size is a column in your calculation?

From: Phil Luckhurst [phil.luckhu...@powerassure.com]
Sent: Wednesday, May 21, 2014 5:42 PM
To: cassandra-u...@incubator.apache.org
Subject: Re: Can SSTables overlap with SizeTieredCompactionStrategy?

I'm wondering if the lack of response to this means it was a dumb question
however I've searched the documentation again but I still can't find an
answer :-(

Phil



--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Can-SSTables-overlap-with-SizeTieredCompactionStrategy-tp7594574p7594627.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: ownership not equally distributed

2014-05-21 Thread Rameez Thonnakkal
This issue is resolved.
Don't know the exact root cause though.
Did a re-image of the server which was taking less token ownership and done
the configuration through chef.

Thanks,
Rameez



On Sat, May 17, 2014 at 1:06 AM, Rameez Thonnakkal ssram...@gmail.comwrote:

 Hello

 I am having a 4 node cluster where 2 nodes are in one data center and
 another 2 in a different one.

 But in the first data center the token ownership is not equally
 distributed. I am using vnode feature.

 num_tokens is set to 256 in all nodes.
 initial_number is left blank.

 Datacenter: DC1
 
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving
 --  AddressLoad   Tokens  Owns   Host
 ID   Rack
 UN  10.145.84.167  84.58 MB   256* 0.4% *
 ce5ddceb-b1d4-47ac-8d85-249aa7c5e971  RAC1
 UN  10.145.84.166  692.69 MB  255 44.2%
 e6b5a0fd-20b7-4bf9-9a8e-715cfc823be6  RAC1
 Datacenter: DC2
 
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving
 --  AddressLoad   Tokens  Owns   Host
 ID   Rack
 UN  10.168.67.43   476 MB 256 27.8%
 05dc7ea6-0328-43b8-8b70-bcea856ba41e  RAC1
 UN  10.168.67.42   413.15 MB  256 27.7%
 677025f0-780c-45dc-bb3b-17ad260fba7d  RAC1


 done nodetool repair couple of times, but it didn't help.

 In the node where less ownership there, I have seen a frequent full GC
 occurring couple of times and had to restart cassandra.


 Any suggestions on how to resolve this is highly appreciated.

 Regards,
 Rameez




Re: autoscaling cassandra cluster

2014-05-21 Thread Ben Bromhead
The mechanics for it are simple compared to figuring out when to scale, 
especially when you want to be scaling before peak load on your cluster (adding 
and removing nodes puts additional load on your cluster).

We are currently building our own in-house solution for this for our customers. 
If you want to have a go at it yourself, this is a good starting point:

http://techblog.netflix.com/2013/11/scryer-netflixs-predictive-auto-scaling.html
http://techblog.netflix.com/2013/12/scryer-netflixs-predictive-auto-scaling.html

Most of this is fairly specific to Netflix, but an interesting read nonetheless.

Datastax OpsCenter also provides capacity planning and forecasting and can 
provide an easy set of metrics you can make your scaling decisions on.

http://www.datastax.com/what-we-offer/products-services/datastax-opscenter 

Ben Bromhead
Instaclustr | www.instaclustr.com | @instaclustr | +61 415 936 359




On 21/05/2014, at 7:51 AM, James Horey j...@opencore.io wrote:

 If you're interested and/or need some Cassandra docker images let me know 
 I'll shoot you a link.
 
 James
 
 Sent from my iPhone
 
 On May 21, 2014, at 10:19 AM, Jabbar Azam aja...@gmail.com wrote:
 
 That sounds interesting.   I was thinking of using coreos with docker 
 containers for the business logic, frontend and Cassandra. I'll also have a 
 look at cassandra-mesos
 
 Thanks
 
 Jabbar Azam
 
 On 21 May 2014 14:04, Panagiotis Garefalakis panga...@gmail.com wrote:
 I agree with Prem, but recently a guy send this promising project called 
 Mesos in this list. 
 https://github.com/mesosphere/cassandra-mesos
 One of its goals is to make scaling easier. 
 I don’t have any personal opinion yet but maybe you could give it a try.
 
 Regards,
 Panagiotis
 
 
 
 On Wed, May 21, 2014 at 3:49 PM, Jabbar Azam aja...@gmail.com wrote:
 Hello Prem,
 
 I'm trying to find out whether people are autoscaling up and down 
 automatically, not manually. I'm also interested in whether they are using a 
 cloud based solution and creating and destroying instances. 
 
 I've found the following regarding GCE 
 https://cloud.google.com/developers/articles/auto-scaling-on-the-google-cloud-platform
  and how instances can be created and destroyed. 
 
  I
 
 
 Thanks
 
 Jabbar Azam
 
 
 On 21 May 2014 13:09, Prem Yadav ipremya...@gmail.com wrote:
 Hi Jabbar,
 with vnodes, scaling up should not be a problem. You could just add a 
 machines with the cluster/seed/datacenter conf and it should join the 
 cluster.
 Scaling down has to be manual where you drain the node and decommission it.
 
 thanks,
 Prem
 
 
 
 On Wed, May 21, 2014 at 12:35 PM, Jabbar Azam aja...@gmail.com wrote:
 Hello,
 
 Has anybody got a cassandra cluster which autoscales depending on load or 
 times of the day?
 
 I've seen the documentation on the datastax website and that only mentioned 
 adding and removing nodes, unless I've missed something.
 
 I want to know how to do this for the google compute engine. This isn't for 
 a production system but a test system(multiple nodes) where I want to learn. 
 I'm not sure how to check the performance of the cluster, whether I use one 
 performance metric or a mix of performance metrics and then invoke a script 
 to add or remove nodes from the cluster.
 
 I'd be interested to know whether people out there are autoscaling cassandra 
 on demand.
 
 Thanks
 
 Jabbar Azam
 
 
 



How to enable a Cassandra node to participate in multiple cluster

2014-05-21 Thread Salih Kardan
Hello everyone,

I want to use Cassandra cluster for some specific purpose across data
centers. What I want to figure out is how can I enable a single Cassandra
node to participate in multiple clusters at the same time? I googled it,
however I could not find any use case of Cassandra as I mentioned above. Is
this possible with the current architecture of Cassandra?

Salih


Re: How to enable a Cassandra node to participate in multiple cluster

2014-05-21 Thread Jabbar Azam
Hello Salih,

As far as I'm aware a node can't be in two clusters. In the casdandra.yaml
file you can only specify one cluster. The storage system and all the
protocols would have to be modified so information about multiple clusters
is passed around. I'm sure somebody else could give you more and accurate
detail.

If your saving on hardware then you could think about using docker or
virtualisation , but you'll have problems with performance. A bit like the
problems you get when you have small instances at Amazon.

Thanks

Jabbar Azam
On 21 May 2014 19:07, Salih Kardan karda...@gmail.com wrote:

 Hello everyone,

 I want to use Cassandra cluster for some specific purpose across data
 centers. What I want to figure out is how can I enable a single Cassandra
 node to participate in multiple clusters at the same time? I googled it,
 however I could not find any use case of Cassandra as I mentioned above. Is
 this possible with the current architecture of Cassandra?

 Salih



Re: autoscaling cassandra cluster

2014-05-21 Thread Jabbar Azam
Hello James,

How do you alter your cassandra.yaml file with each nodes IP address?

I want to use the scaling software(which I've not got yet) to create and
destroy the GCE instances. I want to use fleet to deploy and undeploy the
cassandra nodes inside the docker instances. I do realise I will have to
run nodetool to add and remove the nodes from the cluster and also the node
cleanup.

Disclaimer: this is not a production system but something Im experimenting
with in my own time.


Thanks

Jabbar Azam


On 21 May 2014 15:51, James Horey j...@opencore.io wrote:

 If you're interested and/or need some Cassandra docker images let me know
 I'll shoot you a link.

 James

 Sent from my iPhone

 On May 21, 2014, at 10:19 AM, Jabbar Azam aja...@gmail.com wrote:

 That sounds interesting.   I was thinking of using coreos with docker
 containers for the business logic, frontend and Cassandra. I'll also have a
 look at cassandra-mesos

 Thanks

 Jabbar Azam
 On 21 May 2014 14:04, Panagiotis Garefalakis panga...@gmail.com wrote:

 I agree with Prem, but recently a guy send this promising project called
 Mesos in this list.
 https://github.com/mesosphere/cassandra-mesos
 One of its goals is to make scaling easier.
 I don’t have any personal opinion yet but maybe you could give it a try.

 Regards,
 Panagiotis



 On Wed, May 21, 2014 at 3:49 PM, Jabbar Azam aja...@gmail.com wrote:

 Hello Prem,

 I'm trying to find out whether people are autoscaling up and down
 automatically, not manually. I'm also interested in whether they are using
 a cloud based solution and creating and destroying instances.

 I've found the following regarding GCE
 https://cloud.google.com/developers/articles/auto-scaling-on-the-google-cloud-platformand
  how instances can be created and destroyed.

  I


 Thanks

 Jabbar Azam


 On 21 May 2014 13:09, Prem Yadav ipremya...@gmail.com wrote:

 Hi Jabbar,
 with vnodes, scaling up should not be a problem. You could just add a
 machines with the cluster/seed/datacenter conf and it should join the
 cluster.
 Scaling down has to be manual where you drain the node and decommission
 it.

 thanks,
 Prem



 On Wed, May 21, 2014 at 12:35 PM, Jabbar Azam aja...@gmail.com wrote:

 Hello,

 Has anybody got a cassandra cluster which autoscales depending on load
 or times of the day?

 I've seen the documentation on the datastax website and that only
 mentioned adding and removing nodes, unless I've missed something.

 I want to know how to do this for the google compute engine. This
 isn't for a production system but a test system(multiple nodes) where I
 want to learn. I'm not sure how to check the performance of the cluster,
 whether I use one performance metric or a mix of performance metrics and
 then invoke a script to add or remove nodes from the cluster.

 I'd be interested to know whether people out there are autoscaling
 cassandra on demand.

 Thanks

 Jabbar Azam







Re: autoscaling cassandra cluster

2014-05-21 Thread Jabbar Azam
Hello Ben,

I''m looking forward to reading the netflix links. Thanks :)


Thanks

Jabbar Azam


On 21 May 2014 18:08, Ben Bromhead b...@instaclustr.com wrote:

 The mechanics for it are simple compared to figuring out when to scale,
 especially when you want to be scaling before peak load on your cluster
 (adding and removing nodes puts additional load on your cluster).

 We are currently building our own in-house solution for this for our
 customers. If you want to have a go at it yourself, this is a good starting
 point:


 http://techblog.netflix.com/2013/11/scryer-netflixs-predictive-auto-scaling.html

 http://techblog.netflix.com/2013/12/scryer-netflixs-predictive-auto-scaling.html

 Most of this is fairly specific to Netflix, but an interesting read
 nonetheless.

 Datastax OpsCenter also provides capacity planning and forecasting and can
 provide an easy set of metrics you can make your scaling decisions on.

 http://www.datastax.com/what-we-offer/products-services/datastax-opscenter


 Ben Bromhead
 Instaclustr | www.instaclustr.com | 
 @instaclustrhttp://twitter.com/instaclustr |
 +61 415 936 359




 On 21/05/2014, at 7:51 AM, James Horey j...@opencore.io wrote:

 If you're interested and/or need some Cassandra docker images let me know
 I'll shoot you a link.

 James

 Sent from my iPhone

 On May 21, 2014, at 10:19 AM, Jabbar Azam aja...@gmail.com wrote:

 That sounds interesting.   I was thinking of using coreos with docker
 containers for the business logic, frontend and Cassandra. I'll also have a
 look at cassandra-mesos

 Thanks

 Jabbar Azam
 On 21 May 2014 14:04, Panagiotis Garefalakis panga...@gmail.com wrote:

 I agree with Prem, but recently a guy send this promising project called
 Mesos in this list.
 https://github.com/mesosphere/cassandra-mesos
 One of its goals is to make scaling easier.
 I don’t have any personal opinion yet but maybe you could give it a try.

 Regards,
 Panagiotis



 On Wed, May 21, 2014 at 3:49 PM, Jabbar Azam aja...@gmail.com wrote:

 Hello Prem,

 I'm trying to find out whether people are autoscaling up and down
 automatically, not manually. I'm also interested in whether they are using
 a cloud based solution and creating and destroying instances.

 I've found the following regarding GCE
 https://cloud.google.com/developers/articles/auto-scaling-on-the-google-cloud-platformand
  how instances can be created and destroyed.

  I


 Thanks

 Jabbar Azam


 On 21 May 2014 13:09, Prem Yadav ipremya...@gmail.com wrote:

 Hi Jabbar,
 with vnodes, scaling up should not be a problem. You could just add a
 machines with the cluster/seed/datacenter conf and it should join the
 cluster.
 Scaling down has to be manual where you drain the node and decommission
 it.

 thanks,
 Prem



 On Wed, May 21, 2014 at 12:35 PM, Jabbar Azam aja...@gmail.com wrote:

 Hello,

 Has anybody got a cassandra cluster which autoscales depending on load
 or times of the day?

 I've seen the documentation on the datastax website and that only
 mentioned adding and removing nodes, unless I've missed something.

 I want to know how to do this for the google compute engine. This
 isn't for a production system but a test system(multiple nodes) where I
 want to learn. I'm not sure how to check the performance of the cluster,
 whether I use one performance metric or a mix of performance metrics and
 then invoke a script to add or remove nodes from the cluster.

 I'd be interested to know whether people out there are autoscaling
 cassandra on demand.

 Thanks

 Jabbar Azam








Re: Is the tarball for a given release in a Maven repository somewhere?

2014-05-21 Thread Lewis John Mcgibbney
Hi Clint,

On Wed, May 21, 2014 at 5:29 AM, user-digest-h...@cassandra.apache.orgwrote:


 Is the tarball for a given release in a Maven repository somewhere?

 Hi all,


...snip


 I poked around online and could not find what I was looking for.  Any
 help would be appreciated!


A Maven repos? No. Currently tar.gz builds are not generated and signed
with the Cassandra release process and sent to MCentral.
This is an ideal opportunity for someone to intervene and improve this
process... log a Jira ticket as it would only benefit people if they can
find these artifacts in MCental as well.

You can however, ALWAYS find archived releases of Cassandra (as with EVERY
Apache release) under archive.apache.org
Specifically you will be looking here
http://archive.apache.org/dist/cassandra/
It is up to the Cassandra release manager and PMC generally to keep the
main distribution server clean and uncluttered so hence the reason
non-current releases are automatically pulled down to archive.a.o
hth
Lewis


Fwd: Cassandra pre 2.1 vs 2.1 counter implementation

2014-05-21 Thread Localhost shell
Hey All,


I am new to C* community.

We are planning to use Datastax C* (pre 2.1) in production. We heavily use
counters and it is mostly what we do apart from storing the few months raw
logs in C*.



I have gone through the excellent Sylvain Lesbresne ppt
http://www.datastax.com/wp-content/uploads/2011/07/cassandra_sf_counters.pdfand
the design 
dochttps://issues.apache.org/jira/secure/attachment/12459754/Partitionedcountersdesigndoc.pdf
for
the pre 2.1 counter implementation in detail. Unfortunately the video is
not available.


*Questions:*



1. I know from https://issues.apache.org/jira/browse/CASSANDRA-6504 that
Counters design has changed in 2.1 but not able to get hold of the deign
doc. Can someone plz share the design docs?

Also according to the 6504 issue and C* codebase CounterMutation.apply(),
C* 2.1 has introduced locks that were not existent before.

How does it impact the write performance of counter as compared to pre 2.1
no lock partition counter implementation?


2.  What were the major concerns (other than idempotency and overcount due
to timeout exceptions) in the pre 2.1 counters architecture that led to a
rewrite of counters implementation?


Thanks for the help.

--Unilocal


Re: Cassandra pre 2.1 vs 2.1 counter implementation

2014-05-21 Thread Nate McCall
Jonathan covered the changes in some detail at one of our recent meetups
(at about 36 minutes in, give or take):

http://capitalfactory.lifesize.com/videos/video/309/?access_token=shr0003098845257289498283770596639066969
From:
http://www.meetup.com/Austin-Cassandra-Users/events/158857962/

tl;dr: They will be slower, but accurate.


On Wed, May 21, 2014 at 3:15 PM, Localhost shell 
universal.localh...@gmail.com wrote:


 Hey All,


 I am new to C* community.

 We are planning to use Datastax C* (pre 2.1) in production. We heavily
 use counters and it is mostly what we do apart from storing the few months
 raw logs in C*.



 I have gone through the excellent Sylvain Lesbresne ppt 
 http://www.datastax.com/wp-content/uploads/2011/07/cassandra_sf_counters.pdfand
 the design 
 dochttps://issues.apache.org/jira/secure/attachment/12459754/Partitionedcountersdesigndoc.pdf
  for
 the pre 2.1 counter implementation in detail. Unfortunately the video is
 not available.


 *Questions:*



 1. I know from https://issues.apache.org/jira/browse/CASSANDRA-6504 that
 Counters design has changed in 2.1 but not able to get hold of the deign
 doc. Can someone plz share the design docs?

 Also according to the 6504 issue and C* codebase
 CounterMutation.apply(), C* 2.1 has introduced locks that were not
 existent before.

 How does it impact the write performance of counter as compared to pre 2.1
 no lock partition counter implementation?


 2.  What were the major concerns (other than idempotency and overcount
 due to timeout exceptions) in the pre 2.1 counters architecture that led
 to a rewrite of counters implementation?


 Thanks for the help.

 --Unilocal




-- 
-
Nate McCall
Austin, TX
@zznate

Co-Founder  Sr. Technical Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com


Re: Is the tarball for a given release in a Maven repository somewhere?

2014-05-21 Thread Clint Kelly
Thanks, Lewis.  I created a ticket here:

https://issues.apache.org/jira/browse/CASSANDRA-7283

For now I just copied the cassandra and cassandra.in.sh scripts
into my project, along with custom configuration files.  We already
have all of the necessary JARs in our project's lib directory, since
our code depends on them anyway.

Best regards,
Clint

On Wed, May 21, 2014 at 1:02 PM, Lewis John Mcgibbney
lewis.mcgibb...@gmail.com wrote:
 Hi Clint,

 On Wed, May 21, 2014 at 5:29 AM, user-digest-h...@cassandra.apache.org
 wrote:


 Is the tarball for a given release in a Maven repository somewhere?

 Hi all,


 ...snip


 I poked around online and could not find what I was looking for.  Any
 help would be appreciated!


 A Maven repos? No. Currently tar.gz builds are not generated and signed with
 the Cassandra release process and sent to MCentral.
 This is an ideal opportunity for someone to intervene and improve this
 process... log a Jira ticket as it would only benefit people if they can
 find these artifacts in MCental as well.

 You can however, ALWAYS find archived releases of Cassandra (as with EVERY
 Apache release) under archive.apache.org
 Specifically you will be looking here
 http://archive.apache.org/dist/cassandra/
 It is up to the Cassandra release manager and PMC generally to keep the main
 distribution server clean and uncluttered so hence the reason non-current
 releases are automatically pulled down to archive.a.o
 hth
 Lewis


RE: Can SSTables overlap with SizeTieredCompactionStrategy?

2014-05-21 Thread Phil Luckhurst
We based the estimate on a previous controlled observation. We generated a
year's worth of one minute data for a single identifier and recorded the
size of the resulting sstable. By adding the data one month at a time we
observed that there was a linear predictable increase in the sstable size.
Using this we simply multiplied by the number of identifiers, in this case
700, to get the 7GB estimate.
And as noted above this estimate is correct once the data is compacted to
one sstable but is wrong when there are multiple sstables.

Phil


Andreas Finke wrote
 Hi Phil,
 
 there is no dump question ;) What is your size estimation based on e.g.
 what size is a column in your calculation?





--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Can-SSTables-overlap-with-SizeTieredCompactionStrategy-tp7594574p7594641.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


RE: Can SSTables overlap with SizeTieredCompactionStrategy?

2014-05-21 Thread DuyHai Doan
Are you sure there is no TTL set on your data? It might explain the shrink
in sstable size after compaction.
Le 21 mai 2014 23:17, Phil Luckhurst phil.luckhu...@powerassure.com a
écrit :

 We based the estimate on a previous controlled observation. We generated a
 year's worth of one minute data for a single identifier and recorded the
 size of the resulting sstable. By adding the data one month at a time we
 observed that there was a linear predictable increase in the sstable size.
 Using this we simply multiplied by the number of identifiers, in this case
 700, to get the 7GB estimate.
 And as noted above this estimate is correct once the data is compacted to
 one sstable but is wrong when there are multiple sstables.

 Phil


 Andreas Finke wrote
  Hi Phil,
 
  there is no dump question ;) What is your size estimation based on e.g.
  what size is a column in your calculation?





 --
 View this message in context:
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Can-SSTables-overlap-with-SizeTieredCompactionStrategy-tp7594574p7594641.html
 Sent from the cassandra-u...@incubator.apache.org mailing list archive at
 Nabble.com.



Re: autoscaling cassandra cluster

2014-05-21 Thread James Horey
You normally don't (ferry auto-generates the IP addresses). Let's move this 
conversation to the ferry-user google group so that we don't pollute this 
mailing list...

James

Sent from my iPhone

 On May 21, 2014, at 3:15 PM, Jabbar Azam aja...@gmail.com wrote:
 
 Hello James,
 
 How do you alter your cassandra.yaml file with each nodes IP address?
 
 I want to use the scaling software(which I've not got yet) to create and 
 destroy the GCE instances. I want to use fleet to deploy and undeploy the 
 cassandra nodes inside the docker instances. I do realise I will have to run 
 nodetool to add and remove the nodes from the cluster and also the node 
 cleanup.
 
 Disclaimer: this is not a production system but something Im experimenting 
 with in my own time.
 
 
 Thanks
 
 Jabbar Azam
 
 
 On 21 May 2014 15:51, James Horey j...@opencore.io wrote:
 If you're interested and/or need some Cassandra docker images let me know 
 I'll shoot you a link.
 
 James
 
 Sent from my iPhone
 
 On May 21, 2014, at 10:19 AM, Jabbar Azam aja...@gmail.com wrote:
 
 That sounds interesting.   I was thinking of using coreos with docker 
 containers for the business logic, frontend and Cassandra. I'll also have a 
 look at cassandra-mesos
 
 Thanks
 
 Jabbar Azam
 
 On 21 May 2014 14:04, Panagiotis Garefalakis panga...@gmail.com wrote:
 I agree with Prem, but recently a guy send this promising project called 
 Mesos in this list. 
 https://github.com/mesosphere/cassandra-mesos
 One of its goals is to make scaling easier. 
 I don’t have any personal opinion yet but maybe you could give it a try.
 
 Regards,
 Panagiotis
 
 
 
 On Wed, May 21, 2014 at 3:49 PM, Jabbar Azam aja...@gmail.com wrote:
 Hello Prem,
 
 I'm trying to find out whether people are autoscaling up and down 
 automatically, not manually. I'm also interested in whether they are 
 using a cloud based solution and creating and destroying instances. 
 
 I've found the following regarding GCE 
 https://cloud.google.com/developers/articles/auto-scaling-on-the-google-cloud-platform
  and how instances can be created and destroyed. 
 
  I
 
 
 Thanks
 
 Jabbar Azam
 
 
 On 21 May 2014 13:09, Prem Yadav ipremya...@gmail.com wrote:
 Hi Jabbar,
 with vnodes, scaling up should not be a problem. You could just add a 
 machines with the cluster/seed/datacenter conf and it should join the 
 cluster.
 Scaling down has to be manual where you drain the node and decommission 
 it.
 
 thanks,
 Prem
 
 
 
 On Wed, May 21, 2014 at 12:35 PM, Jabbar Azam aja...@gmail.com wrote:
 Hello,
 
 Has anybody got a cassandra cluster which autoscales depending on load 
 or times of the day?
 
 I've seen the documentation on the datastax website and that only 
 mentioned adding and removing nodes, unless I've missed something.
 
 I want to know how to do this for the google compute engine. This isn't 
 for a production system but a test system(multiple nodes) where I want 
 to learn. I'm not sure how to check the performance of the cluster, 
 whether I use one performance metric or a mix of performance metrics 
 and then invoke a script to add or remove nodes from the cluster.
 
 I'd be interested to know whether people out there are autoscaling 
 cassandra on demand.
 
 Thanks
 
 Jabbar Azam