Re: Adding nodes to existing cluster

2015-04-20 Thread Colin Clark
unsubscribe On Apr 20, 2015, at 8:08 AM, Carlos Rolo r...@pythian.com wrote: Independent of the snitch, data needs to travel to the new nodes (plus all the keyspace information that goes via gossip). So I won't bootstrap them all at once, even if it is only for network traffic generated.

Re: Cassandra 1.2.9 will not start

2015-04-18 Thread Colin Clark
unsubscribe On Apr 18, 2015, at 4:26 PM, Bill Miller bmil...@inthinc.com wrote: I tried restarting two nodes that were working and now I get this. INFO 15:13:50,296 Initializing system.range_xfers INFO 15:13:50,300 Initializing system.schema_keyspaces INFO 15:13:50,301 Opening

Re: Replication to second data center with different number of nodes

2015-03-28 Thread Colin Clark
I typically use a # a lot lower than 256, usually less than 20 for num_tokens as a larger number has historically had a dramatic impact on query performance. — Colin Clark co...@clark.ws +1 612-859-6129 skype colin.p.clark On Mar 28, 2015, at 3:46 PM, Eric Stevens migh...@gmail.com wrote

Re: Mutable primary key in a table

2015-02-08 Thread Colin Clark
is usually considered a bad idea and is simply not even permitted by most RDBMS. — Colin Clark co...@clark.ws +1 320-221-9531 skype colin.p.clark On Feb 8, 2015, at 4:16 PM, Eric Stevens migh...@gmail.com wrote: It sounds like changing user names is the kind of thing which doesn't happen often

Re: 2x disk space required for full compaction? Don't vnodes help this problem?

2014-07-24 Thread Colin Clark
Triggering a major compaction is usually not a good idea. If you've got ssd's, go leveled as DuyHai says. The results will be tasty. -- Colin 320-221-9531 On Jul 24, 2014, at 5:28 PM, Kevin Burton bur...@spinn3r.com wrote: This was after a bootstrap… so I triggered a major compaction.

Re: Data model for streaming a large table in real time.

2014-06-07 Thread Colin Clark
It's an anti-pattern and there are better ways to do this. I have implemented the paging algorithm you've described using wide rows and bucketing. This approach is a more efficient utilization of Cassandra's built in wholesome goodness. Also, I wouldn't let any number of clients (huge) connect

Re: Data model for streaming a large table in real time.

2014-06-07 Thread Colin Clark
No, you're not-the partition key will get distributed across the cluster if you're using random or murmur. You could also ensure that by adding another column, like source to ensure distribution. (Add the seconds to the partition key, not the clustering columns) I can almost guarantee that if

Re: Data model for streaming a large table in real time.

2014-06-07 Thread Colin Clark
wrote: Thanks for the feedback on this btw.. .it's helpful. My notes below. On Sat, Jun 7, 2014 at 5:14 PM, Colin Clark co...@clark.ws wrote: No, you're not-the partition key will get distributed across the cluster if you're using random or murmur. Yes… I'm aware. But in practice

Re: Data model for streaming a large table in real time.

2014-06-07 Thread Colin Clark
: What's 'source' ? You mean like the URL? If source too random it's going to yield too many buckets. Ingestion rates are fairly high but not insane. About 4M inserts per hour.. from 5-10GB… On Sat, Jun 7, 2014 at 7:13 PM, Colin Clark co...@clark.ws wrote: Not if you add another column

Re: Data model for streaming a large table in real time.

2014-06-07 Thread Colin Clark
Burton bur...@spinn3r.com wrote: What's 'source' ? You mean like the URL? If source too random it's going to yield too many buckets. Ingestion rates are fairly high but not insane. About 4M inserts per hour.. from 5-10GB… On Sat, Jun 7, 2014 at 7:13 PM, Colin Clark co...@clark.ws wrote

Re: Data model for streaming a large table in real time.

2014-06-07 Thread Colin Clark
, Jun 7, 2014 at 7:38 PM, Colin Clark co...@clark.ws wrote: With 100 nodes, that ingestion rate is actually quite low and I don't think you'd need another column in the partition key. You seem to be set in your current direction. Let us know how it works out. -- Colin 320-221-9531

Re: Nectar client - New Cassandra Client for .Net

2014-06-02 Thread Colin Clark
Is your version of Hector using native protocol or thrift? -- Colin +1 320 221 9531 On Mon, Jun 2, 2014 at 6:41 AM, Peter Lin wool...@gmail.com wrote: I'm happy to announce Concord has decided to open source our port of Hector to .Net. The project is hosted on google code

Re: Nectar client - New Cassandra Client for .Net

2014-06-02 Thread Colin Clark
, 2014 at 8:08 AM, Colin Clark co...@clark.ws wrote: Is your version of Hector using native protocol or thrift? -- Colin +1 320 221 9531 On Mon, Jun 2, 2014 at 6:41 AM, Peter Lin wool...@gmail.com wrote: I'm happy to announce Concord has decided to open source our port of Hector to .Net

Re: Nectar client - New Cassandra Client for .Net

2014-06-02 Thread Colin Clark
in DataStax's git? If it's going to be the standard protocol, then it really should be in apache's repo. That's my bias opinion. On Mon, Jun 2, 2014 at 8:16 AM, Colin Clark co...@clark.ws wrote: Unless a cassandra driver is using the native protocol, it's going to have a very short life

Re: decommissioning a node

2014-05-25 Thread Colin Clark
Try this: nodetool decomission host-id-of-node-to-decomission UN means UP, NORMAL -- Colin +1 320 221 9531 On Sun, May 25, 2014 at 9:09 AM, Tim Dunphy bluethu...@gmail.com wrote: Also for information that may help diagnose this issue I am running cassandra 2.0.7 I am also using these

Re: initial token crashes cassandra

2014-05-17 Thread Colin Clark
You probably generated the wrong token type. Look for a murmur token generator on the Datastax site. -- Colin 320-221-9531 On May 17, 2014, at 7:00 PM, Tim Dunphy bluethu...@gmail.com wrote: Hi and thanks for your response. The puzzling thing is that yes I am using the murmur partition, yet

Re: initial token crashes cassandra

2014-05-17 Thread Colin Clark
Looks like you may have put the token next to num-tokens property in the yaml file for one node. I would double check the yaml's to make sure the tokens are setup correctly and that the ip addresses are associated with the right entries as well. Compare them to a fresh download if possible to

Re: Datamodel for a highscore list

2014-01-23 Thread Colin Clark
Most of the work I've done like this has used sparse table definitions and the empty column trick. I didn't explain that very well in my last response. I think by using the userid as the rowid, and using the friend id as the column name with the score, that I would put an entire user's friend

Re: Datamodel for a highscore list

2014-01-23 Thread Colin Clark
One of tricks I've used a lot with cassandra is a sparse df definition and inserted columns programmatically that weren't in the definition. I'd be tempted to look at putting a users friend list on one row, the row would look like this: ROWIDCOLUMNS UserID UserId, UserID,

Re: Datamodel for a highscore list

2014-01-22 Thread Colin Clark
How many users and how many games? -- Colin +1 320 221 9531 On Jan 22, 2014, at 10:59 AM, Kasper Middelboe Petersen kas...@sybogames.com wrote: I can think of two cases where something bad would happen in this case: 1. Something bad happens after the increment but before some or all of the

Re: Cassandra to Oracle?

2012-01-22 Thread Colin Clark
You don't have to use oracle and pay money, you can use postgresql for example. Triggers aren't that hard to implement. We actually do.all of our mutations now via triggers and we did it inside by effectivley overriding the mutate logic itself. On Jan 20, 2012 11:42 AM, Zach Richardson

Re: TechCrunch article on Twitter and Cassandra

2010-07-10 Thread Colin Clark
I'm not aware of anyone classifying what twitter is doing today as 'working.' In fact, I believe that twitter's problems are much larger than just technology but that's a whole different subject. What twitter may have realized is that they don't have the resources of Facebook, that

Re: TechCrunch article on Twitter and Cassandra

2010-07-10 Thread Colin Clark
/EventCloudPro%20 On 7/10/2010 5:21 PM, Benjamin Black wrote: On Sat, Jul 10, 2010 at 12:22 PM, Colin Clark co...@cloudeventprocessing.com wrote: Although I'm a fan of Cassandra, there's no way I'd use it today for my tier 1 deployments, because I don't have the resources of Facebook, and even though

Re: Digg 4 Preview on TWiT

2010-07-06 Thread Colin Clark
What were the right questions? I view Facebook's move away from Cassandra as somewhat significant. And are they indeed using HBase then, and if so, what were the right answers? On 7/6/2010 5:34 AM, David Strauss wrote: On 2010-07-05 15:40, Eric Evans wrote: On Sun, 2010-07-04 at 13:14