autoscaling cassandra cluster
Hello, Has anybody got a cassandra cluster which autoscales depending on load or times of the day? I've seen the documentation on the datastax website and that only mentioned adding and removing nodes, unless I've missed something. I want to know how to do this for the google compute engine. This isn't for a production system but a test system(multiple nodes) where I want to learn. I'm not sure how to check the performance of the cluster, whether I use one performance metric or a mix of performance metrics and then invoke a script to add or remove nodes from the cluster. I'd be interested to know whether people out there are autoscaling cassandra on demand. Thanks Jabbar Azam
Re: autoscaling cassandra cluster
Hi Jabbar, with vnodes, scaling up should not be a problem. You could just add a machines with the cluster/seed/datacenter conf and it should join the cluster. Scaling down has to be manual where you drain the node and decommission it. thanks, Prem On Wed, May 21, 2014 at 12:35 PM, Jabbar Azam aja...@gmail.com wrote: Hello, Has anybody got a cassandra cluster which autoscales depending on load or times of the day? I've seen the documentation on the datastax website and that only mentioned adding and removing nodes, unless I've missed something. I want to know how to do this for the google compute engine. This isn't for a production system but a test system(multiple nodes) where I want to learn. I'm not sure how to check the performance of the cluster, whether I use one performance metric or a mix of performance metrics and then invoke a script to add or remove nodes from the cluster. I'd be interested to know whether people out there are autoscaling cassandra on demand. Thanks Jabbar Azam
Re: CqlStorage can't perform INSERTs with Pig?
In CQL Updates and Inserts are the same thing. You need to convert your insert statements to UPDATE Here is a quick example loading from a JSON file, into two cassandra tables Notice the the output query is URL Encoded. a = load 'barcode_uuid_mapping_current.json' using JsonLoader('uuidMapping:{(barcode:chararray,uuid:chararray)}'); result = foreach a GENERATE FLATTEN(uuidMapping); result = foreach a GENERATE FLATTEN(uuidMapping); data_to_insert = FOREACH result GENERATE TOTUPLE( TOTUPLE('barcode',barcode) ), TOTUPLE( uuid ) ; STORE data_to_insert INTO 'cql://tcgadata/barcode_to_uuid?output_query=update%20barcode_to_uuid%20set%20uuid%20%3D%20%3F' USING CqlStorage(); data_to_insert = FOREACH result GENERATE TOTUPLE( TOTUPLE('uuid',uuid) ), TOTUPLE( barcode ) ; STORE data_to_insert INTO 'cql://tcgadata/uuid_to_barcode?output_query=update%20uuid_to_barcode%20set%20barcode%20%3D%20%3F' USING CqlStorage(); There are some other examples here: http://www.datastax.com/dev/blog/cql3-table-support-in-hadoop-pig-and-hive and http://www.schappet.com/pig_cassandra_bulk_load/ On May 20, 2014, at 10:02 PM, Kevin Burton bur...@spinn3r.com wrote: It seems that CqlStorage can't perform INSERTs when using pig. IS there a reason for this? Here's the relevant code from 2.0.7: String cqlQuery = CqlConfigHelper.getOutputCql(conf).trim(); if (cqlQuery.toLowerCase().startsWith(insert)) throw new UnsupportedOperationException(INSERT with CqlRecordWriter is not supported, please use UPDATE/DELETE statement); … It seems to me that a DELETE and UPDATE is significantly less important than INSERT. My use case is that I'm using pig to build a custom secondary index, and then loading it back into cassandra. -- Founder/CEO Spinn3r.com Location: San Francisco, CA Skype: burtonator blog: http://burtonator.wordpress.com … or check out my Google+ profile War is peace. Freedom is slavery. Ignorance is strength. Corporations are people.
Re: autoscaling cassandra cluster
Hello Prem, I'm trying to find out whether people are autoscaling up and down automatically, not manually. I'm also interested in whether they are using a cloud based solution and creating and destroying instances. I've found the following regarding GCE https://cloud.google.com/developers/articles/auto-scaling-on-the-google-cloud-platformand how instances can be created and destroyed. I Thanks Jabbar Azam On 21 May 2014 13:09, Prem Yadav ipremya...@gmail.com wrote: Hi Jabbar, with vnodes, scaling up should not be a problem. You could just add a machines with the cluster/seed/datacenter conf and it should join the cluster. Scaling down has to be manual where you drain the node and decommission it. thanks, Prem On Wed, May 21, 2014 at 12:35 PM, Jabbar Azam aja...@gmail.com wrote: Hello, Has anybody got a cassandra cluster which autoscales depending on load or times of the day? I've seen the documentation on the datastax website and that only mentioned adding and removing nodes, unless I've missed something. I want to know how to do this for the google compute engine. This isn't for a production system but a test system(multiple nodes) where I want to learn. I'm not sure how to check the performance of the cluster, whether I use one performance metric or a mix of performance metrics and then invoke a script to add or remove nodes from the cluster. I'd be interested to know whether people out there are autoscaling cassandra on demand. Thanks Jabbar Azam
Re: autoscaling cassandra cluster
I agree with Prem, but recently a guy send this promising project called Mesos in this list. https://github.com/mesosphere/cassandra-mesos One of its goals is to make scaling easier. I don’t have any personal opinion yet but maybe you could give it a try. Regards, Panagiotis On Wed, May 21, 2014 at 3:49 PM, Jabbar Azam aja...@gmail.com wrote: Hello Prem, I'm trying to find out whether people are autoscaling up and down automatically, not manually. I'm also interested in whether they are using a cloud based solution and creating and destroying instances. I've found the following regarding GCE https://cloud.google.com/developers/articles/auto-scaling-on-the-google-cloud-platformand how instances can be created and destroyed. I Thanks Jabbar Azam On 21 May 2014 13:09, Prem Yadav ipremya...@gmail.com wrote: Hi Jabbar, with vnodes, scaling up should not be a problem. You could just add a machines with the cluster/seed/datacenter conf and it should join the cluster. Scaling down has to be manual where you drain the node and decommission it. thanks, Prem On Wed, May 21, 2014 at 12:35 PM, Jabbar Azam aja...@gmail.com wrote: Hello, Has anybody got a cassandra cluster which autoscales depending on load or times of the day? I've seen the documentation on the datastax website and that only mentioned adding and removing nodes, unless I've missed something. I want to know how to do this for the google compute engine. This isn't for a production system but a test system(multiple nodes) where I want to learn. I'm not sure how to check the performance of the cluster, whether I use one performance metric or a mix of performance metrics and then invoke a script to add or remove nodes from the cluster. I'd be interested to know whether people out there are autoscaling cassandra on demand. Thanks Jabbar Azam
Re: autoscaling cassandra cluster
That sounds interesting. I was thinking of using coreos with docker containers for the business logic, frontend and Cassandra. I'll also have a look at cassandra-mesos Thanks Jabbar Azam On 21 May 2014 14:04, Panagiotis Garefalakis panga...@gmail.com wrote: I agree with Prem, but recently a guy send this promising project called Mesos in this list. https://github.com/mesosphere/cassandra-mesos One of its goals is to make scaling easier. I don’t have any personal opinion yet but maybe you could give it a try. Regards, Panagiotis On Wed, May 21, 2014 at 3:49 PM, Jabbar Azam aja...@gmail.com wrote: Hello Prem, I'm trying to find out whether people are autoscaling up and down automatically, not manually. I'm also interested in whether they are using a cloud based solution and creating and destroying instances. I've found the following regarding GCE https://cloud.google.com/developers/articles/auto-scaling-on-the-google-cloud-platformand how instances can be created and destroyed. I Thanks Jabbar Azam On 21 May 2014 13:09, Prem Yadav ipremya...@gmail.com wrote: Hi Jabbar, with vnodes, scaling up should not be a problem. You could just add a machines with the cluster/seed/datacenter conf and it should join the cluster. Scaling down has to be manual where you drain the node and decommission it. thanks, Prem On Wed, May 21, 2014 at 12:35 PM, Jabbar Azam aja...@gmail.com wrote: Hello, Has anybody got a cassandra cluster which autoscales depending on load or times of the day? I've seen the documentation on the datastax website and that only mentioned adding and removing nodes, unless I've missed something. I want to know how to do this for the google compute engine. This isn't for a production system but a test system(multiple nodes) where I want to learn. I'm not sure how to check the performance of the cluster, whether I use one performance metric or a mix of performance metrics and then invoke a script to add or remove nodes from the cluster. I'd be interested to know whether people out there are autoscaling cassandra on demand. Thanks Jabbar Azam
Re: autoscaling cassandra cluster
If you're interested and/or need some Cassandra docker images let me know I'll shoot you a link. James Sent from my iPhone On May 21, 2014, at 10:19 AM, Jabbar Azam aja...@gmail.com wrote: That sounds interesting. I was thinking of using coreos with docker containers for the business logic, frontend and Cassandra. I'll also have a look at cassandra-mesos Thanks Jabbar Azam On 21 May 2014 14:04, Panagiotis Garefalakis panga...@gmail.com wrote: I agree with Prem, but recently a guy send this promising project called Mesos in this list. https://github.com/mesosphere/cassandra-mesos One of its goals is to make scaling easier. I don’t have any personal opinion yet but maybe you could give it a try. Regards, Panagiotis On Wed, May 21, 2014 at 3:49 PM, Jabbar Azam aja...@gmail.com wrote: Hello Prem, I'm trying to find out whether people are autoscaling up and down automatically, not manually. I'm also interested in whether they are using a cloud based solution and creating and destroying instances. I've found the following regarding GCE https://cloud.google.com/developers/articles/auto-scaling-on-the-google-cloud-platform and how instances can be created and destroyed. I Thanks Jabbar Azam On 21 May 2014 13:09, Prem Yadav ipremya...@gmail.com wrote: Hi Jabbar, with vnodes, scaling up should not be a problem. You could just add a machines with the cluster/seed/datacenter conf and it should join the cluster. Scaling down has to be manual where you drain the node and decommission it. thanks, Prem On Wed, May 21, 2014 at 12:35 PM, Jabbar Azam aja...@gmail.com wrote: Hello, Has anybody got a cassandra cluster which autoscales depending on load or times of the day? I've seen the documentation on the datastax website and that only mentioned adding and removing nodes, unless I've missed something. I want to know how to do this for the google compute engine. This isn't for a production system but a test system(multiple nodes) where I want to learn. I'm not sure how to check the performance of the cluster, whether I use one performance metric or a mix of performance metrics and then invoke a script to add or remove nodes from the cluster. I'd be interested to know whether people out there are autoscaling cassandra on demand. Thanks Jabbar Azam
Re: Can SSTables overlap with SizeTieredCompactionStrategy?
I'm wondering if the lack of response to this means it was a dumb question however I've searched the documentation again but I still can't find an answer :-( Phil -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Can-SSTables-overlap-with-SizeTieredCompactionStrategy-tp7594574p7594627.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Data locality with cash
Hi, I've had a look at the Hive plugin for Cassandra[1]. Does anyone know if it supports data locality if I install task trackers and job trackers on my Cassandra instances? [1] https://github.com/tuplejump/cash Thanks, Jens
Re: Can SSTables overlap with SizeTieredCompactionStrategy?
I would think its because of the index and filter files. Also the additional data which gets added because of serialization. Also, since SStables are only deleted after the compaction us finished, it might be possible that when you checked, the intermediate SSTables were not yet deleted. However, 50% additional disk usage does sound bad. On Wed, May 21, 2014 at 4:42 PM, Phil Luckhurst phil.luckhu...@powerassure.com wrote: I'm wondering if the lack of response to this means it was a dumb question however I've searched the documentation again but I still can't find an answer :-( Phil -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Can-SSTables-overlap-with-SizeTieredCompactionStrategy-tp7594574p7594627.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
RE: Can SSTables overlap with SizeTieredCompactionStrategy?
Hi Phil, there is no dump question ;) What is your size estimation based on e.g. what size is a column in your calculation? From: Phil Luckhurst [phil.luckhu...@powerassure.com] Sent: Wednesday, May 21, 2014 5:42 PM To: cassandra-u...@incubator.apache.org Subject: Re: Can SSTables overlap with SizeTieredCompactionStrategy? I'm wondering if the lack of response to this means it was a dumb question however I've searched the documentation again but I still can't find an answer :-( Phil -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Can-SSTables-overlap-with-SizeTieredCompactionStrategy-tp7594574p7594627.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: ownership not equally distributed
This issue is resolved. Don't know the exact root cause though. Did a re-image of the server which was taking less token ownership and done the configuration through chef. Thanks, Rameez On Sat, May 17, 2014 at 1:06 AM, Rameez Thonnakkal ssram...@gmail.comwrote: Hello I am having a 4 node cluster where 2 nodes are in one data center and another 2 in a different one. But in the first data center the token ownership is not equally distributed. I am using vnode feature. num_tokens is set to 256 in all nodes. initial_number is left blank. Datacenter: DC1 Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- AddressLoad Tokens Owns Host ID Rack UN 10.145.84.167 84.58 MB 256* 0.4% * ce5ddceb-b1d4-47ac-8d85-249aa7c5e971 RAC1 UN 10.145.84.166 692.69 MB 255 44.2% e6b5a0fd-20b7-4bf9-9a8e-715cfc823be6 RAC1 Datacenter: DC2 Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- AddressLoad Tokens Owns Host ID Rack UN 10.168.67.43 476 MB 256 27.8% 05dc7ea6-0328-43b8-8b70-bcea856ba41e RAC1 UN 10.168.67.42 413.15 MB 256 27.7% 677025f0-780c-45dc-bb3b-17ad260fba7d RAC1 done nodetool repair couple of times, but it didn't help. In the node where less ownership there, I have seen a frequent full GC occurring couple of times and had to restart cassandra. Any suggestions on how to resolve this is highly appreciated. Regards, Rameez
Re: autoscaling cassandra cluster
The mechanics for it are simple compared to figuring out when to scale, especially when you want to be scaling before peak load on your cluster (adding and removing nodes puts additional load on your cluster). We are currently building our own in-house solution for this for our customers. If you want to have a go at it yourself, this is a good starting point: http://techblog.netflix.com/2013/11/scryer-netflixs-predictive-auto-scaling.html http://techblog.netflix.com/2013/12/scryer-netflixs-predictive-auto-scaling.html Most of this is fairly specific to Netflix, but an interesting read nonetheless. Datastax OpsCenter also provides capacity planning and forecasting and can provide an easy set of metrics you can make your scaling decisions on. http://www.datastax.com/what-we-offer/products-services/datastax-opscenter Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustr | +61 415 936 359 On 21/05/2014, at 7:51 AM, James Horey j...@opencore.io wrote: If you're interested and/or need some Cassandra docker images let me know I'll shoot you a link. James Sent from my iPhone On May 21, 2014, at 10:19 AM, Jabbar Azam aja...@gmail.com wrote: That sounds interesting. I was thinking of using coreos with docker containers for the business logic, frontend and Cassandra. I'll also have a look at cassandra-mesos Thanks Jabbar Azam On 21 May 2014 14:04, Panagiotis Garefalakis panga...@gmail.com wrote: I agree with Prem, but recently a guy send this promising project called Mesos in this list. https://github.com/mesosphere/cassandra-mesos One of its goals is to make scaling easier. I don’t have any personal opinion yet but maybe you could give it a try. Regards, Panagiotis On Wed, May 21, 2014 at 3:49 PM, Jabbar Azam aja...@gmail.com wrote: Hello Prem, I'm trying to find out whether people are autoscaling up and down automatically, not manually. I'm also interested in whether they are using a cloud based solution and creating and destroying instances. I've found the following regarding GCE https://cloud.google.com/developers/articles/auto-scaling-on-the-google-cloud-platform and how instances can be created and destroyed. I Thanks Jabbar Azam On 21 May 2014 13:09, Prem Yadav ipremya...@gmail.com wrote: Hi Jabbar, with vnodes, scaling up should not be a problem. You could just add a machines with the cluster/seed/datacenter conf and it should join the cluster. Scaling down has to be manual where you drain the node and decommission it. thanks, Prem On Wed, May 21, 2014 at 12:35 PM, Jabbar Azam aja...@gmail.com wrote: Hello, Has anybody got a cassandra cluster which autoscales depending on load or times of the day? I've seen the documentation on the datastax website and that only mentioned adding and removing nodes, unless I've missed something. I want to know how to do this for the google compute engine. This isn't for a production system but a test system(multiple nodes) where I want to learn. I'm not sure how to check the performance of the cluster, whether I use one performance metric or a mix of performance metrics and then invoke a script to add or remove nodes from the cluster. I'd be interested to know whether people out there are autoscaling cassandra on demand. Thanks Jabbar Azam
How to enable a Cassandra node to participate in multiple cluster
Hello everyone, I want to use Cassandra cluster for some specific purpose across data centers. What I want to figure out is how can I enable a single Cassandra node to participate in multiple clusters at the same time? I googled it, however I could not find any use case of Cassandra as I mentioned above. Is this possible with the current architecture of Cassandra? Salih
Re: How to enable a Cassandra node to participate in multiple cluster
Hello Salih, As far as I'm aware a node can't be in two clusters. In the casdandra.yaml file you can only specify one cluster. The storage system and all the protocols would have to be modified so information about multiple clusters is passed around. I'm sure somebody else could give you more and accurate detail. If your saving on hardware then you could think about using docker or virtualisation , but you'll have problems with performance. A bit like the problems you get when you have small instances at Amazon. Thanks Jabbar Azam On 21 May 2014 19:07, Salih Kardan karda...@gmail.com wrote: Hello everyone, I want to use Cassandra cluster for some specific purpose across data centers. What I want to figure out is how can I enable a single Cassandra node to participate in multiple clusters at the same time? I googled it, however I could not find any use case of Cassandra as I mentioned above. Is this possible with the current architecture of Cassandra? Salih
Re: autoscaling cassandra cluster
Hello James, How do you alter your cassandra.yaml file with each nodes IP address? I want to use the scaling software(which I've not got yet) to create and destroy the GCE instances. I want to use fleet to deploy and undeploy the cassandra nodes inside the docker instances. I do realise I will have to run nodetool to add and remove the nodes from the cluster and also the node cleanup. Disclaimer: this is not a production system but something Im experimenting with in my own time. Thanks Jabbar Azam On 21 May 2014 15:51, James Horey j...@opencore.io wrote: If you're interested and/or need some Cassandra docker images let me know I'll shoot you a link. James Sent from my iPhone On May 21, 2014, at 10:19 AM, Jabbar Azam aja...@gmail.com wrote: That sounds interesting. I was thinking of using coreos with docker containers for the business logic, frontend and Cassandra. I'll also have a look at cassandra-mesos Thanks Jabbar Azam On 21 May 2014 14:04, Panagiotis Garefalakis panga...@gmail.com wrote: I agree with Prem, but recently a guy send this promising project called Mesos in this list. https://github.com/mesosphere/cassandra-mesos One of its goals is to make scaling easier. I don’t have any personal opinion yet but maybe you could give it a try. Regards, Panagiotis On Wed, May 21, 2014 at 3:49 PM, Jabbar Azam aja...@gmail.com wrote: Hello Prem, I'm trying to find out whether people are autoscaling up and down automatically, not manually. I'm also interested in whether they are using a cloud based solution and creating and destroying instances. I've found the following regarding GCE https://cloud.google.com/developers/articles/auto-scaling-on-the-google-cloud-platformand how instances can be created and destroyed. I Thanks Jabbar Azam On 21 May 2014 13:09, Prem Yadav ipremya...@gmail.com wrote: Hi Jabbar, with vnodes, scaling up should not be a problem. You could just add a machines with the cluster/seed/datacenter conf and it should join the cluster. Scaling down has to be manual where you drain the node and decommission it. thanks, Prem On Wed, May 21, 2014 at 12:35 PM, Jabbar Azam aja...@gmail.com wrote: Hello, Has anybody got a cassandra cluster which autoscales depending on load or times of the day? I've seen the documentation on the datastax website and that only mentioned adding and removing nodes, unless I've missed something. I want to know how to do this for the google compute engine. This isn't for a production system but a test system(multiple nodes) where I want to learn. I'm not sure how to check the performance of the cluster, whether I use one performance metric or a mix of performance metrics and then invoke a script to add or remove nodes from the cluster. I'd be interested to know whether people out there are autoscaling cassandra on demand. Thanks Jabbar Azam
Re: autoscaling cassandra cluster
Hello Ben, I''m looking forward to reading the netflix links. Thanks :) Thanks Jabbar Azam On 21 May 2014 18:08, Ben Bromhead b...@instaclustr.com wrote: The mechanics for it are simple compared to figuring out when to scale, especially when you want to be scaling before peak load on your cluster (adding and removing nodes puts additional load on your cluster). We are currently building our own in-house solution for this for our customers. If you want to have a go at it yourself, this is a good starting point: http://techblog.netflix.com/2013/11/scryer-netflixs-predictive-auto-scaling.html http://techblog.netflix.com/2013/12/scryer-netflixs-predictive-auto-scaling.html Most of this is fairly specific to Netflix, but an interesting read nonetheless. Datastax OpsCenter also provides capacity planning and forecasting and can provide an easy set of metrics you can make your scaling decisions on. http://www.datastax.com/what-we-offer/products-services/datastax-opscenter Ben Bromhead Instaclustr | www.instaclustr.com | @instaclustrhttp://twitter.com/instaclustr | +61 415 936 359 On 21/05/2014, at 7:51 AM, James Horey j...@opencore.io wrote: If you're interested and/or need some Cassandra docker images let me know I'll shoot you a link. James Sent from my iPhone On May 21, 2014, at 10:19 AM, Jabbar Azam aja...@gmail.com wrote: That sounds interesting. I was thinking of using coreos with docker containers for the business logic, frontend and Cassandra. I'll also have a look at cassandra-mesos Thanks Jabbar Azam On 21 May 2014 14:04, Panagiotis Garefalakis panga...@gmail.com wrote: I agree with Prem, but recently a guy send this promising project called Mesos in this list. https://github.com/mesosphere/cassandra-mesos One of its goals is to make scaling easier. I don’t have any personal opinion yet but maybe you could give it a try. Regards, Panagiotis On Wed, May 21, 2014 at 3:49 PM, Jabbar Azam aja...@gmail.com wrote: Hello Prem, I'm trying to find out whether people are autoscaling up and down automatically, not manually. I'm also interested in whether they are using a cloud based solution and creating and destroying instances. I've found the following regarding GCE https://cloud.google.com/developers/articles/auto-scaling-on-the-google-cloud-platformand how instances can be created and destroyed. I Thanks Jabbar Azam On 21 May 2014 13:09, Prem Yadav ipremya...@gmail.com wrote: Hi Jabbar, with vnodes, scaling up should not be a problem. You could just add a machines with the cluster/seed/datacenter conf and it should join the cluster. Scaling down has to be manual where you drain the node and decommission it. thanks, Prem On Wed, May 21, 2014 at 12:35 PM, Jabbar Azam aja...@gmail.com wrote: Hello, Has anybody got a cassandra cluster which autoscales depending on load or times of the day? I've seen the documentation on the datastax website and that only mentioned adding and removing nodes, unless I've missed something. I want to know how to do this for the google compute engine. This isn't for a production system but a test system(multiple nodes) where I want to learn. I'm not sure how to check the performance of the cluster, whether I use one performance metric or a mix of performance metrics and then invoke a script to add or remove nodes from the cluster. I'd be interested to know whether people out there are autoscaling cassandra on demand. Thanks Jabbar Azam
Re: Is the tarball for a given release in a Maven repository somewhere?
Hi Clint, On Wed, May 21, 2014 at 5:29 AM, user-digest-h...@cassandra.apache.orgwrote: Is the tarball for a given release in a Maven repository somewhere? Hi all, ...snip I poked around online and could not find what I was looking for. Any help would be appreciated! A Maven repos? No. Currently tar.gz builds are not generated and signed with the Cassandra release process and sent to MCentral. This is an ideal opportunity for someone to intervene and improve this process... log a Jira ticket as it would only benefit people if they can find these artifacts in MCental as well. You can however, ALWAYS find archived releases of Cassandra (as with EVERY Apache release) under archive.apache.org Specifically you will be looking here http://archive.apache.org/dist/cassandra/ It is up to the Cassandra release manager and PMC generally to keep the main distribution server clean and uncluttered so hence the reason non-current releases are automatically pulled down to archive.a.o hth Lewis
Fwd: Cassandra pre 2.1 vs 2.1 counter implementation
Hey All, I am new to C* community. We are planning to use Datastax C* (pre 2.1) in production. We heavily use counters and it is mostly what we do apart from storing the few months raw logs in C*. I have gone through the excellent Sylvain Lesbresne ppt http://www.datastax.com/wp-content/uploads/2011/07/cassandra_sf_counters.pdfand the design dochttps://issues.apache.org/jira/secure/attachment/12459754/Partitionedcountersdesigndoc.pdf for the pre 2.1 counter implementation in detail. Unfortunately the video is not available. *Questions:* 1. I know from https://issues.apache.org/jira/browse/CASSANDRA-6504 that Counters design has changed in 2.1 but not able to get hold of the deign doc. Can someone plz share the design docs? Also according to the 6504 issue and C* codebase CounterMutation.apply(), C* 2.1 has introduced locks that were not existent before. How does it impact the write performance of counter as compared to pre 2.1 no lock partition counter implementation? 2. What were the major concerns (other than idempotency and overcount due to timeout exceptions) in the pre 2.1 counters architecture that led to a rewrite of counters implementation? Thanks for the help. --Unilocal
Re: Cassandra pre 2.1 vs 2.1 counter implementation
Jonathan covered the changes in some detail at one of our recent meetups (at about 36 minutes in, give or take): http://capitalfactory.lifesize.com/videos/video/309/?access_token=shr0003098845257289498283770596639066969 From: http://www.meetup.com/Austin-Cassandra-Users/events/158857962/ tl;dr: They will be slower, but accurate. On Wed, May 21, 2014 at 3:15 PM, Localhost shell universal.localh...@gmail.com wrote: Hey All, I am new to C* community. We are planning to use Datastax C* (pre 2.1) in production. We heavily use counters and it is mostly what we do apart from storing the few months raw logs in C*. I have gone through the excellent Sylvain Lesbresne ppt http://www.datastax.com/wp-content/uploads/2011/07/cassandra_sf_counters.pdfand the design dochttps://issues.apache.org/jira/secure/attachment/12459754/Partitionedcountersdesigndoc.pdf for the pre 2.1 counter implementation in detail. Unfortunately the video is not available. *Questions:* 1. I know from https://issues.apache.org/jira/browse/CASSANDRA-6504 that Counters design has changed in 2.1 but not able to get hold of the deign doc. Can someone plz share the design docs? Also according to the 6504 issue and C* codebase CounterMutation.apply(), C* 2.1 has introduced locks that were not existent before. How does it impact the write performance of counter as compared to pre 2.1 no lock partition counter implementation? 2. What were the major concerns (other than idempotency and overcount due to timeout exceptions) in the pre 2.1 counters architecture that led to a rewrite of counters implementation? Thanks for the help. --Unilocal -- - Nate McCall Austin, TX @zznate Co-Founder Sr. Technical Consultant Apache Cassandra Consulting http://www.thelastpickle.com
Re: Is the tarball for a given release in a Maven repository somewhere?
Thanks, Lewis. I created a ticket here: https://issues.apache.org/jira/browse/CASSANDRA-7283 For now I just copied the cassandra and cassandra.in.sh scripts into my project, along with custom configuration files. We already have all of the necessary JARs in our project's lib directory, since our code depends on them anyway. Best regards, Clint On Wed, May 21, 2014 at 1:02 PM, Lewis John Mcgibbney lewis.mcgibb...@gmail.com wrote: Hi Clint, On Wed, May 21, 2014 at 5:29 AM, user-digest-h...@cassandra.apache.org wrote: Is the tarball for a given release in a Maven repository somewhere? Hi all, ...snip I poked around online and could not find what I was looking for. Any help would be appreciated! A Maven repos? No. Currently tar.gz builds are not generated and signed with the Cassandra release process and sent to MCentral. This is an ideal opportunity for someone to intervene and improve this process... log a Jira ticket as it would only benefit people if they can find these artifacts in MCental as well. You can however, ALWAYS find archived releases of Cassandra (as with EVERY Apache release) under archive.apache.org Specifically you will be looking here http://archive.apache.org/dist/cassandra/ It is up to the Cassandra release manager and PMC generally to keep the main distribution server clean and uncluttered so hence the reason non-current releases are automatically pulled down to archive.a.o hth Lewis
RE: Can SSTables overlap with SizeTieredCompactionStrategy?
We based the estimate on a previous controlled observation. We generated a year's worth of one minute data for a single identifier and recorded the size of the resulting sstable. By adding the data one month at a time we observed that there was a linear predictable increase in the sstable size. Using this we simply multiplied by the number of identifiers, in this case 700, to get the 7GB estimate. And as noted above this estimate is correct once the data is compacted to one sstable but is wrong when there are multiple sstables. Phil Andreas Finke wrote Hi Phil, there is no dump question ;) What is your size estimation based on e.g. what size is a column in your calculation? -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Can-SSTables-overlap-with-SizeTieredCompactionStrategy-tp7594574p7594641.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
RE: Can SSTables overlap with SizeTieredCompactionStrategy?
Are you sure there is no TTL set on your data? It might explain the shrink in sstable size after compaction. Le 21 mai 2014 23:17, Phil Luckhurst phil.luckhu...@powerassure.com a écrit : We based the estimate on a previous controlled observation. We generated a year's worth of one minute data for a single identifier and recorded the size of the resulting sstable. By adding the data one month at a time we observed that there was a linear predictable increase in the sstable size. Using this we simply multiplied by the number of identifiers, in this case 700, to get the 7GB estimate. And as noted above this estimate is correct once the data is compacted to one sstable but is wrong when there are multiple sstables. Phil Andreas Finke wrote Hi Phil, there is no dump question ;) What is your size estimation based on e.g. what size is a column in your calculation? -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Can-SSTables-overlap-with-SizeTieredCompactionStrategy-tp7594574p7594641.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: autoscaling cassandra cluster
You normally don't (ferry auto-generates the IP addresses). Let's move this conversation to the ferry-user google group so that we don't pollute this mailing list... James Sent from my iPhone On May 21, 2014, at 3:15 PM, Jabbar Azam aja...@gmail.com wrote: Hello James, How do you alter your cassandra.yaml file with each nodes IP address? I want to use the scaling software(which I've not got yet) to create and destroy the GCE instances. I want to use fleet to deploy and undeploy the cassandra nodes inside the docker instances. I do realise I will have to run nodetool to add and remove the nodes from the cluster and also the node cleanup. Disclaimer: this is not a production system but something Im experimenting with in my own time. Thanks Jabbar Azam On 21 May 2014 15:51, James Horey j...@opencore.io wrote: If you're interested and/or need some Cassandra docker images let me know I'll shoot you a link. James Sent from my iPhone On May 21, 2014, at 10:19 AM, Jabbar Azam aja...@gmail.com wrote: That sounds interesting. I was thinking of using coreos with docker containers for the business logic, frontend and Cassandra. I'll also have a look at cassandra-mesos Thanks Jabbar Azam On 21 May 2014 14:04, Panagiotis Garefalakis panga...@gmail.com wrote: I agree with Prem, but recently a guy send this promising project called Mesos in this list. https://github.com/mesosphere/cassandra-mesos One of its goals is to make scaling easier. I don’t have any personal opinion yet but maybe you could give it a try. Regards, Panagiotis On Wed, May 21, 2014 at 3:49 PM, Jabbar Azam aja...@gmail.com wrote: Hello Prem, I'm trying to find out whether people are autoscaling up and down automatically, not manually. I'm also interested in whether they are using a cloud based solution and creating and destroying instances. I've found the following regarding GCE https://cloud.google.com/developers/articles/auto-scaling-on-the-google-cloud-platform and how instances can be created and destroyed. I Thanks Jabbar Azam On 21 May 2014 13:09, Prem Yadav ipremya...@gmail.com wrote: Hi Jabbar, with vnodes, scaling up should not be a problem. You could just add a machines with the cluster/seed/datacenter conf and it should join the cluster. Scaling down has to be manual where you drain the node and decommission it. thanks, Prem On Wed, May 21, 2014 at 12:35 PM, Jabbar Azam aja...@gmail.com wrote: Hello, Has anybody got a cassandra cluster which autoscales depending on load or times of the day? I've seen the documentation on the datastax website and that only mentioned adding and removing nodes, unless I've missed something. I want to know how to do this for the google compute engine. This isn't for a production system but a test system(multiple nodes) where I want to learn. I'm not sure how to check the performance of the cluster, whether I use one performance metric or a mix of performance metrics and then invoke a script to add or remove nodes from the cluster. I'd be interested to know whether people out there are autoscaling cassandra on demand. Thanks Jabbar Azam