Re: Exception with java driver

2014-06-19 Thread Sylvain Lebresne
Please don't post on two mailing lists at once, it makes it impossible for people that are not subscribed to the 2 mailing list to follow the thread (and is bad form in general). If unsure which one is the most appropriate, it's fine, pick your best guest (in this case it's clearly a java driver

Re: EBS SSD - Cassandra ?

2014-06-19 Thread Alain RODRIGUEZ
Ok, looks fair enough. Thanks guys. I would be great to be able to add disks when amount of data raises and add nodes when throughput increases... :) 2014-06-19 5:27 GMT+02:00 Ben Bromhead b...@instaclustr.com:

Are writes to indexes performed asynchronously?

2014-06-19 Thread Tom van den Berge
Hi, I have a column family with a secondary index on one of its columns. I noticed that when I write a row to the column family, and immediately query that row through the secondary index, every now and then it won't give any results. Could it be that Cassandra performs the write to the internal

Re: EBS SSD - Cassandra ?

2014-06-19 Thread Benedict Elliott Smith
I would say this is worth benchmarking before jumping to conclusions. The network being a bottleneck (or latency causing) for EBS is, to my knowledge, supposition, and instances can be started with direct connections to EBS if this is a concern. The blog post below shows that even without SSDs the

Best practices for repair

2014-06-19 Thread Paolo Crosato
Hi eveybody, we have some problems running repairs on a timely schedule. We have a three node deployment, and we start repair on one node every week, repairing one columnfamily by one. However, when we run into the big column families, usually repair sessions hangs undefinitely, and we have

Metris library for time series data and cassandra.

2014-06-19 Thread Kevin Burton
Hey guys. If you haven't seen KairosDB, it's a time series database on top of cassandra. Anyway, we're deploying it in production. However, the existing APIs are a bit raw (requiring you to send JSON directly) and don't provide much on top of syntactic sugar. There's the codahale metrics API

Re: Batch of prepared statements exceeding specified threshold

2014-06-19 Thread Pavel Kogan
What a coincidence! Today happened in my cluster of 7 nodes as well. Regards, Pavel On Wed, Jun 18, 2014 at 11:13 AM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: I have a 10 node cluster with cassandra 2.0.8. I am taking this exceptions in the log when I run my code. What my

Re: Best practices for repair

2014-06-19 Thread Jack Krupansky
The DataStax doc should be current best practices: http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_repair_nodes_c.html If you or anybody else finds it inadequate, speak up. -- Jack Krupansky -Original Message- From: Paolo Crosato Sent: Thursday, June 19,

Re: EBS SSD - Cassandra ?

2014-06-19 Thread Nate McCall
If someone really wanted to try this it, I recommend adding an Elastic Network Interface or two for gossip and client/API traffic. This lets EBS and management traffic have the pre-configured network. On Thu, Jun 19, 2014 at 6:54 AM, Benedict Elliott Smith belliottsm...@datastax.com wrote: I

Re: Migration 1.2.14 to 2.0.8 causes Tried to create duplicate hard link at startup

2014-06-19 Thread Tom van den Berge
It turns out this is caused by an earlier, failed attempt to upgrade. Removing all pre-sstablemetamigration snapshot directories solved the issue. Credits to Markus Eriksson. On Wed, Jun 11, 2014 at 9:42 AM, Tom van den Berge t...@drillster.com wrote: No, unfortunately I haven't. On Tue,

Re: EBS SSD - Cassandra ?

2014-06-19 Thread Russell Bradberry
does an elastic network interface really use a different physical network interface? or is it just to give the ability for multiple ip addresses? On June 19, 2014 at 3:56:34 PM, Nate McCall (n...@thelastpickle.com) wrote: If someone really wanted to try this it, I recommend adding an Elastic

Re: Best practices for repair

2014-06-19 Thread Paulo Ricardo Motta Gomes
Hello Paolo, I just published an open source version of the dsetool list_subranges command, which will enable you to perform subrange repair as described in the post. You can find the code and usage instructions here: https://github.com/pauloricardomg/cassandra-list-subranges Currently

Issues with intenode encyrption - Keystore was tampered with, or password was incorrect

2014-06-19 Thread Carlos Scheidecker
Hello, I am using Cassandra 2.1.0-rc1 and trying to set up internode encryption. Here's how I have generated the certificates and keystores: keytool -genkeypair -v -keyalg RSA -keysize 1024 -alias node1 -keystore node1.keystore -storepass 'mypassword' -dname 'CN=Development' -keypass

Re: running out of diskspace during maintenance tasks

2014-06-19 Thread Jens Rantil
Hi Brian, What compaction are you running? Have you tried using leveled compaction? AFAIK it should generally require less disk space during compaction. Cheers, Jens — Sent from Mailbox On Wed, Jun 18, 2014 at 6:02 PM, Brian Tarbox tar...@cabotresearch.com wrote: I'm running on AWS

Re: can I kill very old data files in my data folder (I know that sounds crazy but....)

2014-06-19 Thread Jens Rantil
...and temporarily adding more nodes and rebalancing is not an option?— Sent from Mailbox On Wed, Jun 18, 2014 at 9:39 PM, Brian Tarbox tar...@cabotresearch.com wrote: I don't think I have the space to run a major compaction right now (I'm above 50% disk space used already) and compaction can

Re: EBS SSD - Cassandra ?

2014-06-19 Thread Nate McCall
Sorry - should have been clear I was speaking in terms of route optimizing, not bandwidth. No idea as to the implementation (probably instance specific) and I doubt it actually doubles bandwidth. Specifically: having an ENI dedicated to API traffic did smooth out some recent load tests we did for

Re: Issues with intenode encyrption - Keystore was tampered with, or password was incorrect

2014-06-19 Thread Carlos Scheidecker
Never mind fellas. Found the stupid error here. Sharing with you just in case. Typo error on my script to generate those. I have the '' characters while generating the keystore and certificates. -keystore 'mypassword' while correct is -keystore mypassword I knew it was a certificate issue,

Re: Batch of prepared statements exceeding specified threshold

2014-06-19 Thread Marcelo Elias Del Valle
I know now it's been caused by the heap filling up in some nodes. When it fills up, the node goes does, GC runs more, then the node goes up again. Looking for GCInspector in the log, I see GC takes more time to run each time it runs, as shown bellow. I have set key cache to 100 mb and I was used

Re: Batch of prepared statements exceeding specified threshold

2014-06-19 Thread Marcelo Elias Del Valle
Pavel, Out of curiosity, did it start to happen before some update? Which version of Cassandra are you using? []s 2014-06-19 16:10 GMT-03:00 Pavel Kogan pavel.ko...@cortica.com: What a coincidence! Today happened in my cluster of 7 nodes as well. Regards, Pavel On Wed, Jun 18, 2014

Best way to do a multi_get using CQL

2014-06-19 Thread Marcelo Elias Del Valle
I was taking a look at Cassandra anti-patterns list: http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architecturePlanningAntiPatterns_c.html Among then is SELECT ... IN or index lookups¶

Re: Best way to do a multi_get using CQL

2014-06-19 Thread Jonathan Haddad
Your other option is to fire off async queries. It's pretty straightforward w/ the java or python drivers. On Thu, Jun 19, 2014 at 5:56 PM, Marcelo Elias Del Valle marc...@s1mbi0se.com.br wrote: I was taking a look at Cassandra anti-patterns list:

Re: Best way to do a multi_get using CQL

2014-06-19 Thread Marcelo Elias Del Valle
But using async queries wouldn't be even worse than using SELECT IN? The justification in the docs is I could query many nodes, but I would still do it. Today, I use both async queries AND SELECT IN: SELECT_ENTITY_LOOKUP = SELECT entity_id FROM + ENTITY_LOOKUP + WHERE name=%s and value in(%s)

Re: Best way to do a multi_get using CQL

2014-06-19 Thread Jonathan Haddad
If you use async and your driver is token aware, it will go to the proper node, rather than requiring the coordinator to do so. Realistically you're going to have a connection open to every server anyways. It's the difference between you querying for the data directly and using a coordinator as

Re: Best way to do a multi_get using CQL

2014-06-19 Thread Marcelo Elias Del Valle
This is interesting, I didn't know that! It might make sense then to use select = + async + token aware, I will try to change my code. But would it be a recomended solution for these cases? Any other options? I still would if this is the right use case for Cassandra, to look for random keys in a

Re: Best way to do a multi_get using CQL

2014-06-19 Thread Jonathan Haddad
The only case in which it might be better to use an IN clause is if the entire query can be satisfied from that machine. Otherwise, go async. The native driver reuses connections and intelligently manages the pool for you. It can also multiplex queries over a single connection. I am assuming

Re: EBS SSD - Cassandra ?

2014-06-19 Thread Ben Bromhead
Irrespective of performance and latency numbers there are fundamental flaws with using EBS/NAS and Cassandra, particularly around bandwidth contention and what happens when the shared storage medium breaks. Also obligatory reference to