Major compaction does not seems to free the disk space a lot if wide rows are used.

2013-05-16 Thread Boris Yen
Hi All, Sorry for the wide distribution. Our cassandra is running on 1.0.10. Recently, we are facing a weird situation. We have a column family containing wide rows (each row might have a few million of columns). We delete the columns on a daily basis and we also run major compaction on it

Re: Decommission nodes starts to appear from one node (1.0.11)

2013-05-16 Thread Roshan
I found this bug, seems it is fixed. But I can see that in my situation, the decommission node still I can see from the JMX console LoadMap attribute. Might this is the reason why hector says not enough replica?? Experts, any thoughts?? Thanks. -- View this message in context:

Re: Decommission nodes starts to appear from one node (1.0.11)

2013-05-16 Thread Alain RODRIGUEZ
Not sure to understand you correctly, but if you are dealing with ghost nodes that you want to remove, I never saw a node that could resist to an unsafeAssassinateEndpoint. http://grokbase.com/t/cassandra/user/12b9eaaqq4/remove-crashed-node

Re: (unofficial) Community Poll for Production Operators : Repair

2013-05-16 Thread Alain RODRIGUEZ
@Rob: Thanks about the feedback. Yet I have a weird behavior still unexplained about repairing. Are counters supposed to be repaired too ? I mean, while reading at CL.ONE I can have different values depending on what node is answering. Even after a read repair or a full repair. Shouldn't a repair

vnodes ready for production ?

2013-05-16 Thread Alain RODRIGUEZ
Hi, Adding vnodes is a big improvement to Cassandra, specifically because we have a fluctuating load on our Cassandra depending on the week, and it is quite annoying to add some nodes for one week or two, move tokens and then having to remove them and then move tokens again. Even more if we could

best practices on EC2 question

2013-05-16 Thread Brian Tarbox
From this list and the NYC* conference it seems that the consensus configuration of C* on EC2 is to put the data on an ephemeral drive and then periodically back it the drive to S3...relying on C*'s inherent fault tolerance to deal with any data loss. Fine, and we're doing this...but we find that

SSTable size versus read performance

2013-05-16 Thread Keith Wright
Hi all, I currently have 2 clusters, one running on 1.1.10 using CQL2 and one running on 1.2.4 using CQL3 and Vnodes. The machines in the 1.2.4 cluster are expected to have better IO performance as we are going from 1 SSD data disk per node in the 1.1 cluster to 3 SSD data disks per node

Re: SSTable size versus read performance

2013-05-16 Thread Edward Capriolo
I am not sure of the new default is to use compression, but I do not believe compression is a good default. I find compression is better for larger column families that are sparsely read. For high throughput CF's I feel that decompressing larger blocks hurts performance more then compression adds.

Re: SSTable size versus read performance

2013-05-16 Thread Keith Wright
The biggest reason I'm using compression here is that my data lends itself well to it due to the composite columns. My current compression ratio is 30.5%. Not sure it matters but my BF false positive ration os 0.048. From: Edward Capriolo edlinuxg...@gmail.commailto:edlinuxg...@gmail.com

Re: SSTable size versus read performance

2013-05-16 Thread Edward Capriolo
With you use compression you should play with your block size. I believe the default may be 32K but I had more success with 8K, nearly same compression ratio, less young gen memory pressure. On Thu, May 16, 2013 at 10:42 AM, Keith Wright kwri...@nanigans.com wrote: The biggest reason I'm using

Re: SSTable size versus read performance

2013-05-16 Thread Keith Wright
Does Cassandra need to load the entire SSTable into memory to uncompress it or does it only load the relevant block? I ask because if its the latter, that would not explain why I'm seeing so much higher read MB/s in the 1.2 cluster as the block sizes are the same in both. From: Edward

Re: (unofficial) Community Poll for Production Operators : Repair

2013-05-16 Thread Janne Jalkanen
Might you be experiencing this? https://issues.apache.org/jira/browse/CASSANDRA-4417 /Janne On May 16, 2013, at 14:49 , Alain RODRIGUEZ arodr...@gmail.com wrote: @Rob: Thanks about the feedback. Yet I have a weird behavior still unexplained about repairing. Are counters supposed to be

Re: best practices on EC2 question

2013-05-16 Thread Janne Jalkanen
On May 16, 2013, at 17:05 , Brian Tarbox tar...@cabotresearch.com wrote: An alternative that we had explored for a while was to do a two stage backup: 1) copy a C* snapshot from the ephemeral drive to an EBS drive 2) do an EBS snapshot to S3. The idea being that EBS is quite reliable, S3 is

Re: (unofficial) Community Poll for Production Operators : Repair

2013-05-16 Thread Alain RODRIGUEZ
I indeed had some of those in the past. But my point is not that much to understand how I can get different counts depending on the node (I consider this as a weakness of counters and I am aware of it), my wonder is more why those inconsistent, distinct counters never converge even after a

Re: Major compaction does not seems to free the disk space a lot if wide rows are used.

2013-05-16 Thread Louvet, Jacques
Boris, We hit exactly the same issue, and you are correct the newly created SSTables are the cause of why most of the column-tombstone not being purged. There is an improvement in 1.2 train where both the minimum and maximum timestamp for a row is now stored and used during the compaction to

Re: SSTable size versus read performance

2013-05-16 Thread Igor
My 5 cents: I'd check blockdev --getra for data drives - too high values for readahead (default to 256 for debian) can hurt read performance. On 05/16/2013 05:14 PM, Keith Wright wrote: Hi all, I currently have 2 clusters, one running on 1.1.10 using CQL2 and one running on 1.2.4 using

Re: SSTable size versus read performance

2013-05-16 Thread Keith Wright
We actually have it set to 512. I have tried decreasing my SSTable size to 5 MB and changing the chunk size to 8 kb (and run an sstableupgrade to ensure they took effect) but am still seeing similar performance. Is anyone running lz4 compression in production? I'm thinking of reverting back

Re: SSTable size versus read performance

2013-05-16 Thread Bryan Talbot
512 sectors for read-ahead. Are your new fancy SSD drives using large sectors? If your read-ahead is really reading 512 x 4KB per random IO, then that 2 MB per read seems like a lot of extra overhead. -Bryan On Thu, May 16, 2013 at 12:35 PM, Keith Wright kwri...@nanigans.com wrote: We

Re: SSTable size versus read performance

2013-05-16 Thread Edward Capriolo
I was going to say something similar I feel like the SSD drives read much more then the standard drive. Read Ahead/arge sectors could and probably does explain it. On Thu, May 16, 2013 at 3:43 PM, Bryan Talbot btal...@aeriagames.comwrote: 512 sectors for read-ahead. Are your new fancy SSD

Re: SSTable size versus read performance

2013-05-16 Thread Igor
just in case it will be useful to somebody - here is my checklist for better read performance from SSD 1. limit read-ahead to 16 or 32 2. enable 'trickle_fsync' (available starting from cassandra 1.1.x) 3. use 'deadline' io-scheduler (much more important for rotational drives then for SSD) 4.

Re: Major compaction does not seems to free the disk space a lot if wide rows are used.

2013-05-16 Thread Edward Capriolo
This makes sense. Unless you are running major compaction a delete could only happen if the bloom filters confirmed the row was not in the sstables not being compacted. If your rows are wide the odds are that they are in most/all sstables and then finally removing them would be tricky. On Thu,

Re: SSTable size versus read performance

2013-05-16 Thread Keith Wright
Thank you for that. I did not have trickle_fsync enabled and will give it a try. I just noticed that when running a describe on my table, I do not see the sstable size parameter (compaction_strategy_options = {'sstable_size_in_mb':5}) included. Is that expected? Does it mean its using the

Re: SSTable size versus read performance

2013-05-16 Thread Edward Capriolo
lz4 is supposed to achieve similar compression while using less resources then snappy. It is easy to test, just change then run a 'nodetool rebuild' . Not sure when lz4 was introduced but being that it is new to cassandra there may not be many large deployments running it yet. On Thu, May 16,

Re: Upgrade 1.1.10 - 1.2.4

2013-05-16 Thread Everton Lima
But the problem is that I would like to use Cassandra embeeded? This is not possible any more? 2013/5/15 Edward Capriolo edlinuxg...@gmail.com You are doing something wrong. What I was suggesting is only a hack for unit tests. Your not supposed to interact with CassandraServer directly like

Re: Exception when running YCSB and Cassandra

2013-05-16 Thread aaron morton
You're nodes are overloaded. I'd recommend using m1.xlarge instead. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 15/05/2013, at 1:59 PM, Rodrigo Felix rodrigofelixdealme...@gmail.com wrote: Hi, I'm

Re:

2013-05-16 Thread aaron morton
Try the IRC room for the java driver or submit a ticket on the JIRA system, see the links here https://github.com/datastax/java-driver Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 15/05/2013, at 5:50 PM, bjbylh

Re: how to access data only on specific node

2013-05-16 Thread aaron morton
Are you using a multi get or a range slice ? Read Repair does not run for range slice queries. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 15/05/2013, at 6:51 PM, Sergey Naumov sknau...@gmail.com wrote: see

Re: The action of the file system at drop column family execution

2013-05-16 Thread aaron morton
When drop column family is executed irrespective of the existence of generation of Snapshot, $KS/$CF/ directory certainly remains. I don't think there is any code there to delete the empty directories. We only care about the files in there. Cheers - Aaron Morton Freelance

Re: How to add new DC to cluster when GossipingPropertyFileSnitch is used

2013-05-16 Thread aaron morton
You should configure the seeds as recommended regardless of the snitch used. You need to update the yaml file to start using the GossipingPropertyFileSnitch but after that it reads the cassandra-rackdc.properties file to get information about the node. It reads uses the information in gossip

Re: Multiple cursors

2013-05-16 Thread aaron morton
We don't have cursors in the RDBMS sense of things. If you are using thrift the recommendation is to use connection pooling and re-use connections for different requests. Note that you can not multiplex queries over the same thrift connection, you must wait for the response before issuing

Re: C++ Thrift client

2013-05-16 Thread aaron morton
(Assuming you have enabled tcp_nodelay on the client socket) Check the server side latency, using nodetool cfstats or nodetool cfhistograms. Check the logs for messages from the GCInspector about ParNew pauses. Cheers - Aaron Morton Freelance Cassandra Consultant New Zealand

Re:

2013-05-16 Thread Dave Brosius
what version of netty is on your classpath? On 05/16/2013 07:33 PM, aaron morton wrote: Try the IRC room for the java driver or submit a ticket on the JIRA system, see the links here https://github.com/datastax/java-driver Cheers - Aaron Morton Freelance Cassandra Consultant

Re: Decommission nodes starts to appear from one node (1.0.11)

2013-05-16 Thread Roshan
Thanks. This is kind of a expert advice for me. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Decommission-nodes-starts-to-appear-from-one-node-1-0-11-tp7587842p7587876.html Sent from the cassandra-u...@incubator.apache.org mailing list

Re: pycassa failures in large batch cycling

2013-05-16 Thread John R. Frank
On Tue, 14 May 2013, aaron morton wrote: After several cycles, pycassa starts getting connection failures. Do you have the error stack ?Are the TimedOutExceptions or socket time outs or something else. I figured out the problem here and made this ticket in jira:

Re: Upgrade 1.1.10 - 1.2.4

2013-05-16 Thread Edward Capriolo
Please give an example of the code you are trying to execute. On Thu, May 16, 2013 at 6:26 PM, Everton Lima peitin.inu...@gmail.comwrote: But the problem is that I would like to use Cassandra embeeded? This is not possible any more? 2013/5/15 Edward Capriolo edlinuxg...@gmail.com You

Announcing Mutagen

2013-05-16 Thread Todd Fast
Mutagen Cassandra is a framework providing schema versioning and mutation for Apache Cassandra. It is similar to Flyway for SQL databases. https://github.com/toddfast/mutagen-cassandra Mutagen is a lightweight framework for applying versioned changes (known as mutations) to a resource, in this