Re: 答复: two problems about opscenter 3.2
You have to log in and then you can create topic only below some sections (opscenter, feedbacks, ...) if I remember well. Alain 2013/8/1 yue.zhang yue.zh...@chinacache.com thanks Alain ** ** I don’t know why not permited to create topic on datastax forum*.* ** ** ** ** *发件人:* Alain RODRIGUEZ [mailto:arodr...@gmail.com] *发送时间:* 2013年7月31日 18:11 *收件人:* user@cassandra.apache.org *主题:* Re: two problems about opscenter 3.2 ** ** Here is the pointer to the topic on the DS support forum. ** ** http://www.datastax.com/support-forums/topic/some-32-bugs-reported-in-the-c-user-ml ** ** 2013/7/31 aaron morton aa...@thelastpickle.com You'll get better Ops Centre support on the DS site http://www.datastax.com/support-forums/ cheers - Aaron Morton Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 30/07/2013, at 9:23 PM, Alain RODRIGUEZ arodr...@gmail.com wrote: I also see Waiting for agent information... In the top of the dashboard page, but again nothing on the logs. 2013/7/30 Alain RODRIGUEZ arodr...@gmail.com I also have the following message in the dashboard : Error loading events: Cannot call method 'slice' of null With events and alerts not showing up. No error in the logs (opscenterd and agents) 2013/7/30 yue.zhang yue.zh...@chinacache.com problem-1: - when I edit any “schema- kespacke- cf”,then report “Error saving column family: required argument is not a float” problem-2: - from OpsCenter 3.2 release note( http://www.datastax.com/docs/opscenter/release_notes#opscenter-3-2), it tell me “Added support for CQL3 column families (limited Data Explorer support).”,but it not works. The message below: Viewing data in column families created with CQL3 is not currently supported. We recommend using one of the DataStax drivers or cqlsh as an alternative. thx -heipark ** **
Re: CQL and undefined columns
On Wed, Jul 31, 2013 at 03:10:54PM -0700, Jonathan Haddad wrote: It's advised you do not use compact storage, as it's primarily for backwards compatibility. Yes indeed, I understand what it does and why now, but only because I was pointed to the thrift-to-cql document. The CQL documentation itself doesn't make it at all clear, I was originally under the impression that the way 'COMPACT STORAGE' works was the way CQL works by default, because that's the natural assumption until it's explained why it doesn't work that way. I was pointing out that either the thrift-to-cql document must be wrong, or the CQL document must be wrong, because they contradict each other.
Re: CQL and undefined columns
I am glad this document helped you. I like to point to this 'thrift-to-cql' document, since it was really useful to me when I found it, even if I had to read it at least 3 times entirely and still need to refer to some piece of it sometimes because of the complexity of what is explained in it. @Sylvain, you did a real good job with this blog post. Thanks a lot, be sure I will continue sharing it. Alain 2013/8/1 Jon Ribbens jon-cassan...@unequivocal.co.uk On Wed, Jul 31, 2013 at 03:10:54PM -0700, Jonathan Haddad wrote: It's advised you do not use compact storage, as it's primarily for backwards compatibility. Yes indeed, I understand what it does and why now, but only because I was pointed to the thrift-to-cql document. The CQL documentation itself doesn't make it at all clear, I was originally under the impression that the way 'COMPACT STORAGE' works was the way CQL works by default, because that's the natural assumption until it's explained why it doesn't work that way. I was pointing out that either the thrift-to-cql document must be wrong, or the CQL document must be wrong, because they contradict each other.
Adding my first node to another one...
Hi everyone, I'm trying to wrap my head around Cassandra great ability to expand… I have set up my first Cassandra node a while ago… it was working great, and data wasn't so important back then. Since I had a great experience with Cassandra I decided to migrate step by step my MySQL data to Cassandra. Now data start to be important, so I would like to create another node, and add it. Since I had some issue with my DataCenter, I wanted to have a copy (of sensible data only) on another DataCenter. Quite frankly I'm still a newbie on Cassandra and need your guys help. First things first… Already up and Running Cassandra (Called A): - Do I need to change anything to the cassandra.yaml to make sure that another node can connect ? if yes, should I restart the node (because I would have to warn users about downtime) ? - Since this node should be a seed, the seed list is already set to localhost, is that good enough ? The new node I want to add (Called B): - I know that before starting this node, I should modify the seed list in cassandra.yaml… Is that the only thing I need to do ? It is my first time doing this, so please be gentle ;-) Thank you all, Morgan.
How often to run `nodetool repair`
Hello, I read in the docs that `nodetool repair` should be regularly run unless no delete is ever performed. In my app, I never delete, but I heavily use the ttl feature. Should repair still be run regularly? Also, does repair take less time if it is run regularly? If not, is there a way to incrementally run it? It seems that when I do run repair, it takes a long time and causes high amounts CPU usage and iowait. Thoughts? Thanks, Carl
Cassandra Counter Family
Hi - We are struggling to understand how the counter family maintains consistency in Cassandra. Say Counter1 value is 1 and it is read by 2 clients at the same time who want to update the value. After both write, it will become 3 ?
Re: How often to run `nodetool repair`
We observed the same behavior. During last repair the data distribution on nodes was imbalanced as well resulting in one node bloating. On Aug 1, 2013 12:36 PM, Carl Lerche m...@carllerche.com wrote: Hello, I read in the docs that `nodetool repair` should be regularly run unless no delete is ever performed. In my app, I never delete, but I heavily use the ttl feature. Should repair still be run regularly? Also, does repair take less time if it is run regularly? If not, is there a way to incrementally run it? It seems that when I do run repair, it takes a long time and causes high amounts CPU usage and iowait. Thoughts? Thanks, Carl
Re: How often to run `nodetool repair`
Hi Carl, The ‘repair’ is for data reads. Compaction will take care of the expired data. The fact a repair runs long makes me think the nodes receive unbalanced amounts of writes rather. Regards, Arthur From: Carl Lerche Sent: Thursday, August 01, 2013 12:35 PM To: user@cassandra.apache.org Subject: How often to run `nodetool repair` Hello, I read in the docs that `nodetool repair` should be regularly run unless no delete is ever performed. In my app, I never delete, but I heavily use the ttl feature. Should repair still be run regularly? Also, does repair take less time if it is run regularly? If not, is there a way to incrementally run it? It seems that when I do run repair, it takes a long time and causes high amounts CPU usage and iowait. Thoughts? Thanks, Carl
Re: Adding my first node to another one...
Hi Morgan, The scaling out depends on several factors. The most intricate is perhaps calculating the tokens. Also the Cassandra version is important. At this point in time I suggest you read section Adding Capacity to an Existing Cluster at http://www.datastax.com/docs/1.0/operations/cluster_management and come back here with questions and more details. Regards, Arthur -Original Message- From: Morgan Segalis Sent: Thursday, August 01, 2013 11:24 AM To: user@cassandra.apache.org Subject: Adding my first node to another one... Hi everyone, I'm trying to wrap my head around Cassandra great ability to expand… I have set up my first Cassandra node a while ago… it was working great, and data wasn't so important back then. Since I had a great experience with Cassandra I decided to migrate step by step my MySQL data to Cassandra. Now data start to be important, so I would like to create another node, and add it. Since I had some issue with my DataCenter, I wanted to have a copy (of sensible data only) on another DataCenter. Quite frankly I'm still a newbie on Cassandra and need your guys help. First things first… Already up and Running Cassandra (Called A): - Do I need to change anything to the cassandra.yaml to make sure that another node can connect ? if yes, should I restart the node (because I would have to warn users about downtime) ? - Since this node should be a seed, the seed list is already set to localhost, is that good enough ? The new node I want to add (Called B): - I know that before starting this node, I should modify the seed list in cassandra.yaml… Is that the only thing I need to do ? It is my first time doing this, so please be gentle ;-) Thank you all, Morgan.
Re: How often to run `nodetool repair`
Arthur, Yes, my use case for this Cassandra cluster is analytics. I am building a google dapper (application tracing) like system. I collect application traces and write them to Cassandra. Then, I have periodic rollup tasks that read the data, do some summarization and write it back. Thoughts on how to manage a write heavy cluster? Thanks, Carl On Thu, Aug 1, 2013 at 11:28 AM, Arthur Zubarev arthur.zuba...@aol.comwrote: Hi Carl, The ‘repair’ is for data reads. Compaction will take care of the expired data. The fact a repair runs long makes me think the nodes receive unbalanced amounts of writes rather. Regards, Arthur *From:* Carl Lerche m...@carllerche.com *Sent:* Thursday, August 01, 2013 12:35 PM *To:* user@cassandra.apache.org *Subject:* How often to run `nodetool repair` Hello, I read in the docs that `nodetool repair` should be regularly run unless no delete is ever performed. In my app, I never delete, but I heavily use the ttl feature. Should repair still be run regularly? Also, does repair take less time if it is run regularly? If not, is there a way to incrementally run it? It seems that when I do run repair, it takes a long time and causes high amounts CPU usage and iowait. Thoughts? Thanks, Carl
deb packages (and older versions)
Hey folks, Because 1.2.8 hasn't been pushed to the repo yet, I see that I can pick up the package at http://people.apache.org/~eevans/ and install it manually. This is great. I'm wondering though, is there a place where I can pick up Debian packages for older releases? I definitely prefer the package install, but sometimes when we add new nodes to our ring the only package available is the newest one, which doesn't match the rest of our nodes.
Re: deb packages (and older versions)
On 08/01/2013 12:27 PM, David McNelis wrote: Hey folks, Because 1.2.8 hasn't been pushed to the repo yet, I see that I can pick up the package at http://people.apache.org/~eevans/ and install it manually. This is great. I'm wondering though, is there a place where I can pick up Debian packages for older releases? I definitely prefer the package install, but sometimes when we add new nodes to our ring the only package available is the newest one, which doesn't match the rest of our nodes. DataStax maintains a repo that has old versions: http://debian.datastax.com/community/pool/ Blair
Re: deb packages (and older versions)
Thanks, fwiw, did I just blatantly miss some documentation saying those existed there? On Thu, Aug 1, 2013 at 3:32 PM, Blair Zajac bl...@orcaware.com wrote: On 08/01/2013 12:27 PM, David McNelis wrote: Hey folks, Because 1.2.8 hasn't been pushed to the repo yet, I see that I can pick up the package at http://people.apache.org/~**eevans/http://people.apache.org/~eevans/and install it manually. This is great. I'm wondering though, is there a place where I can pick up Debian packages for older releases? I definitely prefer the package install, but sometimes when we add new nodes to our ring the only package available is the newest one, which doesn't match the rest of our nodes. DataStax maintains a repo that has old versions: http://debian.datastax.com/**community/pool/http://debian.datastax.com/community/pool/ Blair
Secondary Indexes On Partitioned Time Series Data Question
Hello, Say I have time series data for a table like this: CREATE TABLE mytimeseries ( pk_part1 text, partition bigint, e.g. partition per day or per hour pk_part2 text, this is part of the partition key so I can split write load message_id timeuuid, secondary_key1 text, secondary_key2 text, . more columns . PRIMARY KEY ((pk_part1, partition, pk_part2), message_id)); Most of the time I will need to do queries with pk_part1/partition/pk_part2/message_id range. So this is what I optimize for. Sometimes, however, I will need to do queries with pk_part1/partition/message_id range and some combination of secondary_key1 (95% of the time there is a one-to-one relationship with pk_part1) or secondary_key2 (for each secondary_key2 there will be many pk_part2 values). In this time series scenario, to efficiently make use of secondary_key1/secondary_key2 as Cassandra secondary indexes for these queries I assume that secondary_key1/secondary_key_2 would really need to be composites combined into one column (in SQL I would create multi-column indexes)? i.e.: secondary_key_1 - pk_part1 + partition_key + real_secondary_key_1 secondary_key_2 - pl_part2 + partition_key + real_secondary_key_2 Would this be correct? Just making sure I understand how to best use secondary indexes in Cassandra with time series data. thanks in advance, Gareth
Re: Secondary Indexes On Partitioned Time Series Data Question
On Thu, Aug 1, 2013 at 12:49 PM, Gareth Collins gareth.o.coll...@gmail.comwrote: Would this be correct? Just making sure I understand how to best use secondary indexes in Cassandra with time series data. In general unless you ABSOLUTELY NEED the one unique feature of built-in Secondary Indexes (atomic update of base row and index) you should just use a normal column family for secondary index cases. =Rob
Re: deb packages (and older versions)
I don't think they are listed from the cassandra.apache.org. BTW, instructions are here: http://www.datastax.com/documentation/cassandra/1.2/webhelp/index.html#cassandra/install/installDeb_t.html Blair On 08/01/2013 12:36 PM, David McNelis wrote: Thanks, fwiw, did I just blatantly miss some documentation saying those existed there? On Thu, Aug 1, 2013 at 3:32 PM, Blair Zajac bl...@orcaware.com mailto:bl...@orcaware.com wrote: On 08/01/2013 12:27 PM, David McNelis wrote: Hey folks, Because 1.2.8 hasn't been pushed to the repo yet, I see that I can pick up the package at http://people.apache.org/~__eevans/ http://people.apache.org/~eevans/ and install it manually. This is great. I'm wondering though, is there a place where I can pick up Debian packages for older releases? I definitely prefer the package install, but sometimes when we add new nodes to our ring the only package available is the newest one, which doesn't match the rest of our nodes. DataStax maintains a repo that has old versions: http://debian.datastax.com/__community/pool/ http://debian.datastax.com/community/pool/ Blair
Re: How often to run `nodetool repair`
On Thu, Aug 1, 2013 at 12:26 PM, Robert Coli rc...@eventbrite.com wrote: On Thu, Aug 1, 2013 at 9:35 AM, Carl Lerche m...@carllerche.com wrote: I read in the docs that `nodetool repair` should be regularly run unless no delete is ever performed. In my app, I never delete, but I heavily use the ttl feature. Should repair still be run regularly? Also, does repair take less time if it is run regularly? If not, is there a way to incrementally run it? It seems that when I do run repair, it takes a long time and causes high amounts CPU usage and iowait. TTL is effectively DELETE; you need to run a repair once every gc_grace_seconds. If you don't, data might un-delete itself. How is it possible? Every replica has TTL, so it when it expires every replica has tombstone. I don't see how you can get data with no tombstone. What do I miss? Andrey
Re: How often to run `nodetool repair`
Cassandra is an excellent choice for write heavy applications. Reading large sets of data is not as fast and not as easy, you may need to have your client paging thru it and you may need slice queries and proper PK+Indexes to think of in advance. Regards, Arthur From: Carl Lerche Sent: Thursday, August 01, 2013 3:03 PM To: user@cassandra.apache.org ; Arthur Zubarev Subject: Re: How often to run `nodetool repair` Arthur, Yes, my use case for this Cassandra cluster is analytics. I am building a google dapper (application tracing) like system. I collect application traces and write them to Cassandra. Then, I have periodic rollup tasks that read the data, do some summarization and write it back. Thoughts on how to manage a write heavy cluster? Thanks, Carl On Thu, Aug 1, 2013 at 11:28 AM, Arthur Zubarev arthur.zuba...@aol.com wrote: Hi Carl, The ‘repair’ is for data reads. Compaction will take care of the expired data. The fact a repair runs long makes me think the nodes receive unbalanced amounts of writes rather. Regards, Arthur From: Carl Lerche Sent: Thursday, August 01, 2013 12:35 PM To: user@cassandra.apache.org Subject: How often to run `nodetool repair` Hello, I read in the docs that `nodetool repair` should be regularly run unless no delete is ever performed. In my app, I never delete, but I heavily use the ttl feature. Should repair still be run regularly? Also, does repair take less time if it is run regularly? If not, is there a way to incrementally run it? It seems that when I do run repair, it takes a long time and causes high amounts CPU usage and iowait. Thoughts? Thanks, Carl
Re: How often to run `nodetool repair`
TTL is effectively DELETE; you need to run a repair once every gc_grace_seconds. If you don't, data might un-delete itself. The undelete part is not true. btw: With CASSANDRA-4917 TTLed columns will not even create a tombstone (assuming ttl gc_grace). The rest of your mail I agree with :-)
Re: Adding my first node to another one...
Hi Arthur, Thank you for your answer. I have read the section Adding Capacity to an Existing Cluster prior to posting my question. Actually I was thinking I would like Cassandra choose by itself the token. Since I want only some column family to be an ALL cluster, and other column family to be where they are, no matter balancing… I do not find anything on the configuration that I should make on the very first (and only node so far) to start the replication. (The configuration of my Node A is pretty basic, almost out of the box, I might changed the name) How to make this node know that it will be a Seed. My current Node A is using Cassandra 1.1.0 Is it compatible if I install a new node with Cassandra 1.2.8 ? or should I fetch 1.1.0 for Node B ? Thank you. Morgan. Le 1 août 2013 à 20:32, Arthur Zubarev arthur.zuba...@aol.com a écrit : Hi Morgan, The scaling out depends on several factors. The most intricate is perhaps calculating the tokens. Also the Cassandra version is important. At this point in time I suggest you read section Adding Capacity to an Existing Cluster at http://www.datastax.com/docs/1.0/operations/cluster_management and come back here with questions and more details. Regards, Arthur -Original Message- From: Morgan Segalis Sent: Thursday, August 01, 2013 11:24 AM To: user@cassandra.apache.org Subject: Adding my first node to another one... Hi everyone, I'm trying to wrap my head around Cassandra great ability to expand… I have set up my first Cassandra node a while ago… it was working great, and data wasn't so important back then. Since I had a great experience with Cassandra I decided to migrate step by step my MySQL data to Cassandra. Now data start to be important, so I would like to create another node, and add it. Since I had some issue with my DataCenter, I wanted to have a copy (of sensible data only) on another DataCenter. Quite frankly I'm still a newbie on Cassandra and need your guys help. First things first… Already up and Running Cassandra (Called A): - Do I need to change anything to the cassandra.yaml to make sure that another node can connect ? if yes, should I restart the node (because I would have to warn users about downtime) ? - Since this node should be a seed, the seed list is already set to localhost, is that good enough ? The new node I want to add (Called B): - I know that before starting this node, I should modify the seed list in cassandra.yaml… Is that the only thing I need to do ? It is my first time doing this, so please be gentle ;-) Thank you all, Morgan.
Re: How often to run `nodetool repair`
On Thu, Aug 1, 2013 at 1:16 PM, Andrey Ilinykh ailin...@gmail.com wrote: On Thu, Aug 1, 2013 at 12:26 PM, Robert Coli rc...@eventbrite.com wrote: TTL is effectively DELETE; you need to run a repair once every gc_grace_seconds. If you don't, data might un-delete itself. How is it possible? Every replica has TTL, so it when it expires every replica has tombstone. I don't see how you can get data with no tombstone. What do I miss? I knew I had heard of cases where repair is required despite TTL, but didn't recall the specifics. Thanks for the opportunity to go look it up... http://comments.gmane.org/gmane.comp.db.cassandra.user/21008 quoting Sylvain Lebresne : The initial question was about can I use inserting with ttl=1 instead of issuing deletes, ***so that would be a case where you do shadow a previous version with a very small ttl and so repair is important.*** (EMPHASIS rcoli) But you're right that if you only issue data with expiration (no deletes) and that you * either do not overwrite columns * or are sure that when you do overwrite, the value you're overwriting has a ttl that is lesser or equal than the ttl of the value you're overwriting with (+gc_grace to be precise) then yes, ***repair is not necessary because you can't have shadowed value resurfacing.*** (EMPHASIS rcoli) So, to be more precise with my initial statement : TTL is like DELETE in some cases, so unless you are certain that you are not (and will not be) in those cases, you should run repair when using TTL. Also you will be unable to repair entire keyspaces, you will have to repair on a per column family basis, manually excluding CFs matching these criteria, increasing management complexity. =Rob
Re: How often to run `nodetool repair`
On 08/01/2013 01:16 PM, Andrey Ilinykh wrote: TTL is effectively DELETE; you need to run a repair once every gc_grace_seconds. If you don't, data might un-delete itself. How is it possible? Every replica has TTL, so it when it expires every replica has tombstone. I don't see how you can get data with no tombstone. What do I miss? The only way I can think of is this scenario: - value A for some key is written with ttl=30days, to all replicas (i.e a long ttl or no ttl at all) - value B for the same key is written with ttl=1day, but doesn't reach all replicas - one day passes and the ttl=1day values turn into deletes - gc_grace passes and the tombstones are purged at this point, the replica that didn't get the ttl=1day value will think the older value A is live. I'm no expert on this so I may be mistaken, but in any case it's a corner case as overwriting columns with shorter ttls would be unusual. - Erik -
Re: Secondary Indexes On Partitioned Time Series Data Question
Hi Robert, Can you shed some more light (or point towards some other resource) that why you think built-in Secondary Indexes should not be used easily or without much consideration? Thanks. Regards, Shahab On Thu, Aug 1, 2013 at 3:53 PM, Robert Coli rc...@eventbrite.com wrote: On Thu, Aug 1, 2013 at 12:49 PM, Gareth Collins gareth.o.coll...@gmail.com wrote: Would this be correct? Just making sure I understand how to best use secondary indexes in Cassandra with time series data. In general unless you ABSOLUTELY NEED the one unique feature of built-in Secondary Indexes (atomic update of base row and index) you should just use a normal column family for secondary index cases. =Rob
Re: Adding my first node to another one...
I recommend you do not add 1.2 nodes to a 1.1 cluster. We tried this, and ran into many issues. Specifically, the data will not correctly stream from the 1.1 nodes to the 1.2, and it will never bootstrap correctly. On Thu, Aug 1, 2013 at 2:07 PM, Morgan Segalis msega...@gmail.com wrote: Hi Arthur, Thank you for your answer. I have read the section Adding Capacity to an Existing Cluster prior to posting my question. Actually I was thinking I would like Cassandra choose by itself the token. Since I want only some column family to be an ALL cluster, and other column family to be where they are, no matter balancing… I do not find anything on the configuration that I should make on the very first (and only node so far) to start the replication. (The configuration of my Node A is pretty basic, almost out of the box, I might changed the name) How to make this node know that it will be a Seed. My current Node A is using Cassandra 1.1.0 Is it compatible if I install a new node with Cassandra 1.2.8 ? or should I fetch 1.1.0 for Node B ? Thank you. Morgan. Le 1 août 2013 à 20:32, Arthur Zubarev arthur.zuba...@aol.com a écrit : Hi Morgan, The scaling out depends on several factors. The most intricate is perhaps calculating the tokens. Also the Cassandra version is important. At this point in time I suggest you read section Adding Capacity to an Existing Cluster at http://www.datastax.com/docs/1.0/operations/cluster_management and come back here with questions and more details. Regards, Arthur -Original Message- From: Morgan Segalis Sent: Thursday, August 01, 2013 11:24 AM To: user@cassandra.apache.org Subject: Adding my first node to another one... Hi everyone, I'm trying to wrap my head around Cassandra great ability to expand… I have set up my first Cassandra node a while ago… it was working great, and data wasn't so important back then. Since I had a great experience with Cassandra I decided to migrate step by step my MySQL data to Cassandra. Now data start to be important, so I would like to create another node, and add it. Since I had some issue with my DataCenter, I wanted to have a copy (of sensible data only) on another DataCenter. Quite frankly I'm still a newbie on Cassandra and need your guys help. First things first… Already up and Running Cassandra (Called A): - Do I need to change anything to the cassandra.yaml to make sure that another node can connect ? if yes, should I restart the node (because I would have to warn users about downtime) ? - Since this node should be a seed, the seed list is already set to localhost, is that good enough ? The new node I want to add (Called B): - I know that before starting this node, I should modify the seed list in cassandra.yaml… Is that the only thing I need to do ? It is my first time doing this, so please be gentle ;-) Thank you all, Morgan. -- Jon Haddad http://www.rustyrazorblade.com skype: rustyrazorblade
Re: Adding my first node to another one...
On Thu, Aug 1, 2013 at 2:07 PM, Morgan Segalis msega...@gmail.com wrote: Actually I was thinking I would like Cassandra choose by itself the token. You NEVER want Cassandra to choose its own token in production. There is no advantage to doing so and significant risk when used as a matter of course. The conf file even says you should manually specify tokens in production.. How to make this node know that it will be a Seed. The only thing that makes a node a Seed is that any other node has it in its seed list. My current Node A is using Cassandra 1.1.0 You should not run 1.1.0, it contains significant and serious bugs. You should upgrade to the top of 1.1 series ASAP. Is it compatible if I install a new node with Cassandra 1.2.8 ? or should I fetch 1.1.0 for Node B ? It is not compatible, use 1.1.x with 1.1.x. =Rob
Re: Adding my first node to another one...
Hi Rob, Le 2 août 2013 à 00:15, Robert Coli rc...@eventbrite.com a écrit : On Thu, Aug 1, 2013 at 2:07 PM, Morgan Segalis msega...@gmail.com wrote: Actually I was thinking I would like Cassandra choose by itself the token. You NEVER want Cassandra to choose its own token in production. There is no advantage to doing so and significant risk when used as a matter of course. The conf file even says you should manually specify tokens in production.. Ok, then I'll try to understand this token thing. How to make this node know that it will be a Seed. The only thing that makes a node a Seed is that any other node has it in its seed list. Good to know, thanks ! My current Node A is using Cassandra 1.1.0 You should not run 1.1.0, it contains significant and serious bugs. You should upgrade to the top of 1.1 series ASAP. Of course I need to upgrade Cassandra, but I won't do that until I have another node than can take the relay while I'm upgrading. Is it compatible if I install a new node with Cassandra 1.2.8 ? or should I fetch 1.1.0 for Node B ? It is not compatible, use 1.1.x with 1.1.x. Yeah, that's what I though ! =Rob Thank you for your tips.
Re: Secondary Indexes On Partitioned Time Series Data Question
On Thu, Aug 1, 2013 at 2:34 PM, Shahab Yunus shahab.yu...@gmail.com wrote: Can you shed some more light (or point towards some other resource) that why you think built-in Secondary Indexes should not be used easily or without much consideration? Thanks. 1) Secondary indexes are more or less modeled like a manual pseudo Secondary Index CF would be. 2) Except they are more opaque than doing it yourself. For example you cannot see information on them in nodetool cfstats. 3) And there have been a steady trickle of bugs which relate to their implementation, in many cases resulting in them not returning the data they should. [1] 4) These bugs would not apply to a manual pseudo Secondary Index CF. 5) And the only benefits you get are the marginal convenience of querying the secondary index instead of a second CF, and atomic synchronized update. 6) Which most people do not actually need. tl;dr : unless you need the atomic update property, just use a manual pseudo secondary index CF =Rob [1] https://issues.apache.org/jira/browse/CASSANDRA-4785 , https://issues.apache.org/jira/browse/CASSANDRA-5540 , https://issues.apache.org/jira/browse/CASSANDRA-2897 , etc.
Re: Secondary Indexes On Partitioned Time Series Data Question
Thanks a lot. Regards, Shahab On Thu, Aug 1, 2013 at 8:32 PM, Robert Coli rc...@eventbrite.com wrote: On Thu, Aug 1, 2013 at 2:34 PM, Shahab Yunus shahab.yu...@gmail.comwrote: Can you shed some more light (or point towards some other resource) that why you think built-in Secondary Indexes should not be used easily or without much consideration? Thanks. 1) Secondary indexes are more or less modeled like a manual pseudo Secondary Index CF would be. 2) Except they are more opaque than doing it yourself. For example you cannot see information on them in nodetool cfstats. 3) And there have been a steady trickle of bugs which relate to their implementation, in many cases resulting in them not returning the data they should. [1] 4) These bugs would not apply to a manual pseudo Secondary Index CF. 5) And the only benefits you get are the marginal convenience of querying the secondary index instead of a second CF, and atomic synchronized update. 6) Which most people do not actually need. tl;dr : unless you need the atomic update property, just use a manual pseudo secondary index CF =Rob [1] https://issues.apache.org/jira/browse/CASSANDRA-4785 , https://issues.apache.org/jira/browse/CASSANDRA-5540 , https://issues.apache.org/jira/browse/CASSANDRA-2897 , etc.