Re: Replacing a dead node by deleting it and auto_bootstrap'ing a new node (Cassandra 2.0)

2014-12-05 Thread Omri Bahumi
I guess Cassandra is aware that it has some replicas not meeting the replication factor. Wouldn't it be nice if a bootstrapping node would get those? Could make things much simpler in the Ops view. What do you think? On Fri, Dec 5, 2014 at 8:31 AM, Jaydeep Chovatia chovatia.jayd...@gmail.com

Cassandra memory joining issues

2014-12-05 Thread farouk . umar
Hello, A recent incident has brought to light that we have potentially two problems. 1. A node can start going up and down possibly due to memory issues. 2. We can't bring in new nodes Here is an account of the incident. 3 vnode cluster setup (A, B C). Cassandra version 2.0.10 1. We get

Re: Cassandra schema migrator

2014-12-05 Thread Ben Hood
On Tue, Nov 25, 2014 at 12:49 PM, Phil Wise p...@advancedtelematic.com wrote: https://github.com/advancedtelematic/cql-migrate Great to see these tools out there! Just to add to the list https://github.com/mattes/migrate Might not be as C* specific as the other tools mentioned earlier in this

Re: Cassandra schema migrator

2014-12-05 Thread Brian Sam-Bodden
There is also https://github.com/hsgubert/cassandra_migrations On Fri, Dec 5, 2014 at 7:49 AM, Ben Hood 0x6e6...@gmail.com wrote: On Tue, Nov 25, 2014 at 12:49 PM, Phil Wise p...@advancedtelematic.com wrote: https://github.com/advancedtelematic/cql-migrate Great to see these tools out

Re: Cassandra schema migrator

2014-12-05 Thread Phil Wise
I've added these as answers to a question I posted on Stack Overflow: http://stackoverflow.com/questions/26460932/how-to-deploy-changes-to-a-cassandra-cql-schema/27013426 Thank you Phil On 05.12.2014 15:23, Brian Sam-Bodden wrote: There is also https://github.com/hsgubert/cassandra_migrations

Pros and cons of lots of very small partitions versus fewer larger partitions

2014-12-05 Thread Robert Wille
At the data modeling class at the Cassandra Summit, the instructor said that lots of small partitions are just fine. I’ve heard on this list that that is not true, and that its better to cluster small partitions into fewer, larger partitions. Due to conflicting information on this issue, I’d be

Re: Performance Difference between Batch Insert and Bulk Load

2014-12-05 Thread Tyler Hobbs
On Fri, Dec 5, 2014 at 1:15 AM, Dong Dai daidon...@gmail.com wrote: Sounds great! By the way, will you create a ticket for this, so we can follow the updates? What would the ticket be for? (I might have missed something in the conversation.) -- Tyler Hobbs DataStax http://datastax.com/

Re: Pros and cons of lots of very small partitions versus fewer larger partitions

2014-12-05 Thread Tyler Hobbs
On Fri, Dec 5, 2014 at 11:14 AM, Robert Wille rwi...@fold3.com wrote: And lets say that bucket is computed as id / N. For analysis purposes, lets assume I have 100 million id’s to store. Table a is obviously going to have a larger bloom filter. That’s a clear negative. That's true,

Re: Pros and cons of lots of very small partitions versus fewer larger partitions

2014-12-05 Thread DuyHai Doan
Another argument for table A is that it leverages a lot Bloom filter for fast lookup. If negative, no disk hit otherwise at most 1 or 2 disk hits depending on the fp chance. Compaction also works better on skinny partition. On Fri, Dec 5, 2014 at 6:33 PM, Tyler Hobbs ty...@datastax.com wrote:

Re: Performance Difference between Batch Insert and Bulk Load

2014-12-05 Thread Dong Dai
On Dec 5, 2014, at 11:23 AM, Tyler Hobbs ty...@datastax.com wrote: On Fri, Dec 5, 2014 at 1:15 AM, Dong Dai daidon...@gmail.com mailto:daidon...@gmail.com wrote: Sounds great! By the way, will you create a ticket for this, so we can follow the updates? What would the ticket be for?

Re: Performance Difference between Batch Insert and Bulk Load

2014-12-05 Thread Philip Thompson
What progress are you trying to be aware of? All of the features Tyler discussed are implemented and can be used. On Fri, Dec 5, 2014 at 2:41 PM, Dong Dai daidon...@gmail.com wrote: On Dec 5, 2014, at 11:23 AM, Tyler Hobbs ty...@datastax.com wrote: On Fri, Dec 5, 2014 at 1:15 AM, Dong Dai

Re: Performance Difference between Batch Insert and Bulk Load

2014-12-05 Thread Dong Dai
Err, am i misunderstanding something? I thought Tyler is going to add some codes to split unlogged batch and make the batch insertion token aware. it is already done? or else i can do it too. thanks, - Dong On Dec 5, 2014, at 2:06 PM, Philip Thompson philip.thomp...@datastax.com wrote:

Re: Performance Difference between Batch Insert and Bulk Load

2014-12-05 Thread Philip Thompson
Splitting the batches by partition key and inserting them with a TokenAware policy is already possible with existing driver code, though you will have to split the batches yourself. On Fri, Dec 5, 2014 at 3:12 PM, Dong Dai daidon...@gmail.com wrote: Err, am i misunderstanding something? I

Re: full gc too often

2014-12-05 Thread Robert Coli
On Thu, Dec 4, 2014 at 8:13 PM, Philo Yang ud1...@gmail.com wrote: In each time Old Gen reduce only a little, Survivor Space will be clear but the heap is still full so there will be another full gc very soon then the node will down. If I restart the node, it will be fine without gc trouble.

Re: Repair taking many snapshots per minute

2014-12-05 Thread Robert Coli
On Thu, Dec 4, 2014 at 7:19 AM, Robert Wille rwi...@fold3.com wrote: Does anybody have any idea what might cause this? That it happens at all is bizarre, and that it happens on only three nodes is even more bizarre. Also, it really doesn’t seem to have difficulty creating snapshots, so the

Re: Keyspace and table/cf limits

2014-12-05 Thread Robert Coli
On Wed, Dec 3, 2014 at 1:54 PM, Raj N raj.cassan...@gmail.com wrote: The question is more from a multi-tenancy point of view. We wanted to see if we can have a keyspace per client. Each keyspace may have 50 column families, but if we have 200 clients, that would be 10,000 column families. Do

What companies are using Cassandra to serve customer-facing product?

2014-12-05 Thread jeremy p
Hey all, So, I'm currently evaluating Cassandra + CQL as a solution for querying a very large data set (think 60+ TB). We'd like to use it to directly power a customer-facing product. My question is threefold : 1) What companies use Cassandra to serve a customer-facing product? I'm not

Re: Recommissioned node is much smaller

2014-12-05 Thread Robert Coli
On Wed, Dec 3, 2014 at 10:10 AM, Robert Wille rwi...@fold3.com wrote: Load and ownership didn’t correlate nearly as well as I expected. I have lots and lots of very small records. I would expect very high correlation. I think the moral of the story is that I shouldn’t delete the system

Re: Cassandra taking snapshots automatically?

2014-12-05 Thread Robert Coli
On Wed, Dec 3, 2014 at 10:46 AM, Robert Wille rwi...@fold3.com wrote: No. auto_snapshot is turned on, but snapshot_before_compaction is off. Maybe this will shed some light on it. I tried running nodetool repair. I got several messages saying Lost notification. You should check server log

Re: nodetool repair exception

2014-12-05 Thread Robert Coli
On Wed, Dec 3, 2014 at 6:37 AM, Rafał Furmański rfurman...@opera.com wrote: I see “Too many open files” exception in logs, but I’m sure that my limit is now 150k. Should I increase it? What’s the reasonable limit of open files for cassandra? Why provide any limit? ulimit allows unlimited?

Re: What companies are using Cassandra to serve customer-facing product?

2014-12-05 Thread Tyler Hobbs
This page lists a lot of Cassandra users with descriptions of the use case: http://planetcassandra.org/apache-cassandra-use-cases/ On Fri, Dec 5, 2014 at 3:33 PM, jeremy p athomewithagroove...@gmail.com wrote: Hey all, So, I'm currently evaluating Cassandra + CQL as a solution for querying a

Re: Replacing a dead node by deleting it and auto_bootstrap'ing a new node (Cassandra 2.0)

2014-12-05 Thread Jaydeep Chovatia
I think Cassandra gives us control as what we want to do: a) If we want to replace a dead node then we should specify -Dcassandra.replace_address=old_node_ipaddress b) If we are adding new nodes (no replacement) then do not specify above option and tokens would get assigned randomly. I can think

How to model data to achieve specific data locality

2014-12-05 Thread Kai Wang
I have a data model question. I am trying to figure out how to model the data to achieve the best data locality for analytic purpose. Our application processes sequences. Each sequence has a unique key in the format of [seq_id]_[seq_type]. For any given seq_id, there are unlimited number of

Re: Keyspace and table/cf limits

2014-12-05 Thread Kai Wang
On Fri, Dec 5, 2014 at 4:32 PM, Robert Coli rc...@eventbrite.com wrote: On Wed, Dec 3, 2014 at 1:54 PM, Raj N raj.cassan...@gmail.com wrote: The question is more from a multi-tenancy point of view. We wanted to see if we can have a keyspace per client. Each keyspace may have 50 column