Problem with very many small SSTables

2014-12-15 Thread Mathijs Vogelzang
Hi, We have a 6-node cassandra cluster that got into an unstable state because a few servers were very low on Java heap space for a while. This resulted in them flushing an SSTable to disk for almost every write, such that some column families ended up with 1000+ SSTables, most of which contain

Need Help with Cassandra Tombstone

2014-12-15 Thread Chamila Wijayarathna
Hello all, I have a column family where I have to update a field frequency, but it is a clustering key. So I am deleting the existing row and adding a new row again with updated frequency. I want to free the space used for deleted rows as soon as possible, so I decided to change gc_grace_seconds

Re: Need Help with Cassandra Tombstone

2014-12-15 Thread DuyHai Doan
Hello Chamila If you're deleting and inserting again a clustering column, it looks like a queue anti-pattern to be avoided: http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets On Mon, Dec 15, 2014 at 10:06 AM, Chamila Wijayarathna cdwijayarat...@gmail.com

Number of SSTables grows after repair

2014-12-15 Thread Michał Łowicki
Hi, We've noticed that number of SSTables grows radically after running *repair*. What we did today is to compact everything so for each node number of SStables 10. After repair it jumped to ~1600 on each node. What is interesting is that size of many is very small. The smallest ones are ~60

Re: Cassandra Database using too much space

2014-12-15 Thread Jack Krupansky
I also meant to point out that you have to be careful with very wide partitions, like those where the partition key is the year, with all usages for that year. Thousands of rows in a partition is probably okay, but millions could become problematic. 100MB for a single partition is a reasonable

Snappy 1.1.0 Cassandra 2.1.2 compability

2014-12-15 Thread Fredrik Larsson Stigbäck
Is it safe to replace Snappy 1.0.5 in a Cassandra 2.1.2 environment with Snappy 1.1.0? I’ve tried running with 1.1.0 and Cassandra seems to run with no issues and according to this post https://github.com/xerial/snappy-java/issues/60 https://github.com/xerial/snappy-java/issues/60 1.1.0 is

Re: Cassandra Maintenance Best practices

2014-12-15 Thread Neha Trivedi
Thanks very much Jonathan !! On Wed, Dec 10, 2014 at 1:00 PM, Jonathan Haddad j...@jonhaddad.com wrote: I did a presentation on diagnosing performance problems in production at the US Euro summits, in which I covered quite a few tools preventative measures you should know when running a

Re: Good partition key doubt

2014-12-15 Thread José Guilherme Vanz
Nice, I got it. =] If I have more questions I'll send other emails. xD Thank you On Thu, Dec 11, 2014 at 12:17 PM, DuyHai Doan doanduy...@gmail.com wrote: what is a good partition key? Is partition key direct related with my query performance? What is the best practices? A good partition key

Re: batch_size_warn_threshold_in_kb

2014-12-15 Thread Eric Stevens
Unfortunately my Scala isn't the best so I'm going to have to take a little bit to wade through the code. I think the important thing to take from this code is that: 1) execution order is randomized for each run, and new data is randomly generated for each run to eliminate biases. 2) we write

Changing replication factor of Cassandra cluster

2014-12-15 Thread Pranay Agarwal
Hi All, I have 20 nodes cassandra cluster with 500gb of data and replication factor of 1. I increased the replication factor to 3 and ran nodetool repair on each node one by one as the docs says. But it takes hours for 1 node to finish repair. Is that normal or am I doing something wrong? Also,

Is it possible to flush memtable in one virtual center?

2014-12-15 Thread Benyi Wang
We have one ring and two virtual data centers in our Cassandra cluster? one is for Real-Time and the other is for analytics. My questions are: 1. Are there memtables in Analytics Data Center? To my understanding, it is true. 2. Is it possible to flush memtables if exist in Analytics Data

Re: batch_size_warn_threshold_in_kb

2014-12-15 Thread Jonathan Haddad
You are, of course, free to use batches in your application. Keep in mind however, that both my and Ryan's advice is coming from debugging issues in production. I don't know why your Scala script is performing better on batches than async. It could be: 1) network. are you running the test

Re: Is it possible to flush memtable in one virtual center?

2014-12-15 Thread Hannu Kröger
Hi, You have memtables on each machine. So 1) Yes 2) Yes, in any case you have to run nodetool flush for each node that you want to flush. In this case you run flush each node in your analytics DC. Hannu 2014-12-16 1:20 GMT+02:00 Benyi Wang bewang.t...@gmail.com: We have one ring and two