Re: Lucene index plugin for Apache Cassandra

2015-06-12 Thread Andres de la Peña
Unfortunately, we don't have published any benchmarks yet, but we have plans to do it as soon as possible. However, you can expect a similar behavior as those of Elasticsearch or Solr, with some overhead due to the need for indexing both the Cassandra's row key and the partition's token. You can

Question about nodetool status ... output

2015-06-12 Thread Jens Rantil
Hi, I have one node in my 5-node cluster that effectively owns 100% and it looks like my cluster is rather imbalanced. Is it common to have it this imbalanced for 4-5 nodes? My current output for a keyspace is: $ nodetool status myks Datacenter: Cassandra = Status=Up/Down |/

Re: Question about nodetool status ... output

2015-06-12 Thread Carlos Rolo
Your data model also contributes to the balance (or lack of) of the cluster. If you have a really bad data partitioning Cassandra will not do any magic. Regarding that cluster, I would decommission the x.52 node and add it again with the correct configuration. After the bootstrap, run a cleanup.

Question regarding concurrent bootstrapping

2015-06-12 Thread Jens Rantil
Hi, Let's say I have an existing cluster and do the following: 1. I start a new joining node (A). It enters state Up/Joining. Streaming automatically start to this node. 2. I wait two minutes (best practise for bootstrapping). 3. I start a second node (B) to join the cluster. It

Re: Question about nodetool status ... output

2015-06-12 Thread Jens Rantil
Hi Carlos, Yes, I should have been more specific about that; basically all my primary ID:s are random UUIDs so I find that very hard to believe that my data model should be the problem here. I will run a full repair of the cluster, execute a cleanup and recommission the node, then. Thanks, Jens

Re: Lucene index plugin for Apache Cassandra

2015-06-12 Thread Carlos Rolo
Seems like an interesting tool! What operational recommendations would you make to users of this tool (Extra hardware capacity, extra metrics to monitor, etc)? Regards, Carlos Juzarte Rolo Cassandra Consultant Pythian - Love your data rolo@pythian | Twitter: cjrolo | Linkedin:

Re: Atomic behavior and efficiency of a DELETE query with an IN clause

2015-06-12 Thread Sotirios Delimanolis
Similarly, should we send multiple SELECT requests or a single one with a SELECT...IN ? On Wednesday, June 10, 2015 11:27 AM, Sotirios Delimanolis sotodel...@yahoo.com wrote: Will this eventually they will all go through behavior apply to the IN? How is this query written to the

Re: Atomic behavior and efficiency of a DELETE query with an IN clause

2015-06-12 Thread Jonathan Haddad
Multiple async requests. IN() is a performance nightmare unless you're querying against a single partition key. On Fri, Jun 12, 2015 at 1:09 PM Sotirios Delimanolis sotodel...@yahoo.com wrote: Similarly, should we send multiple SELECT requests or a single one with a SELECT...IN ? On

Re: Question regarding concurrent bootstrapping

2015-06-12 Thread Robert Coli
On Fri, Jun 12, 2015 at 5:21 AM, Jens Rantil jens.ran...@tink.se wrote: Let's say I have an existing cluster and do the following: 1. I start a new joining node (A). It enters state Up/Joining. Streaming automatically start to this node. 2. I wait two minutes (best practise for

Dropped mutation messages

2015-06-12 Thread Robert Wille
I am preparing to migrate a large amount of data to Cassandra. In order to test my migration code, I’ve been doing some dry runs to a test cluster. My test cluster is 2.0.15, 3 nodes, RF=1 and CL=QUORUM. I know RF=1 and CL=QUORUM is a weird combination, but my production cluster that will

Re: Cassandra 2.2, 3.0, and beyond

2015-06-12 Thread Robert Coli
On Thu, Jun 11, 2015 at 6:56 PM, Mohammed Guller moham...@glassbeam.com wrote: By that logic, 2.1.0 should have been somewhat as stable as 2.0.10 (the last release of 2.0.x branch before 2.1.0). However, we found out that it took almost 9 months for 2.1.x series to become stable and suitable

Re: Dropped mutation messages

2015-06-12 Thread Robert Wille
I meant to say I’m *not* overloading my cluster. On Jun 12, 2015, at 6:52 PM, Robert Wille rwi...@fold3.com wrote: I am preparing to migrate a large amount of data to Cassandra. In order to test my migration code, I’ve been doing some dry runs to a test cluster. My test cluster is 2.0.15, 3

RE: Lucene index plugin for Apache Cassandra

2015-06-12 Thread Mohammed Guller
The plugin looks cool. Thank you for open sourcing it. Does it support faceting and other Solr functionality? Mohammed From: Andres de la Peña [mailto:adelap...@stratio.com] Sent: Friday, June 12, 2015 3:43 AM To: user@cassandra.apache.org Subject: Re: Lucene index plugin for Apache Cassandra

Re: Lucene index plugin for Apache Cassandra

2015-06-12 Thread Andres de la Peña
I really appreciate your interest Well, the first recommendation is to not use it unless you need it, because a properly Cassandra denormalized model is almost always preferable to indexing. Lucene indexing is a good option when there is no viable denormalization alternative. This is the case of

RE: Support for ad-hoc query

2015-06-12 Thread SEAN_R_DURITY
I will note here that the limitations on ad-hoc querying (and aggregates) make it much more difficult to deal with data quality problems, QA testing, and similar efforts, especially where people are used to a more relational, ad-hoc model. We have often had to extract data from Cassandra to

Re: Support for ad-hoc query

2015-06-12 Thread Jack Krupansky
No dispute about that. But the main design requirement Cassandra strives to meet is to be a blazing fast transactional database - here's the key, give me the data, and here's the key, write this data. Any additional query requirements are a distant second at best. A big part of that transactional

My dse-spark app goes well with spark-submit, BUT GOT STUCK while executing by sbt run or java jar run on my win-pc

2015-06-12 Thread 126
My dse-spark app goes well with spark-submit, BUT GOT STUCK while executing by sbt run or java jar run on my windows pc which means the driver process is in a pc other than a dse cluster node. And what frustrating me is that when I looked through the logs, I see no error, but it just hang

connections remain on CLOSE_WAIT state after process is killed after upgrade to 2.0.15

2015-06-12 Thread Paulo Ricardo Motta Gomes
Hello, We recently upgraded a cluster from 2.0.12 to 2.0.15 and now whenever we stop/kill a cassandra process, some other nodes keep a connection with the dead node in the CLOSE_WAIT state on port 7000 for about 5-20 minutes. So, if I start the killed node again, it cannot handshake with the