Re: Cassandra Needs to Grow Up by Version Five!

2018-02-22 Thread Eric Plowe
Cassandra, hard to use? I disagree completely. With that said, there are definitely deficiencies in certain parts of the documentation, but nothing that is a show stopper. We’ve been using Cassandra since the sub 1.0 days and have had nothing but great things to say about it. With that said, its

RE: Cassandra Needs to Grow Up by Version Five!

2018-02-22 Thread Jacques-Henri Berthemet
Hi Kenneth, As a Cassandra user I value usability, but since it's a database I value consistency and performance even more. If you want usability and documentation you can use Datastax DSE, after all that's where they add value on top of Cassandra. Since Datastax actually paid dev to work

Re: Cassandra Needs to Grow Up by Version Five!

2018-02-22 Thread Sylvain Lebresne
> > I have to disagree with people here and point out that just creating > JIRA's and (trying to) have discussions about these issues will not lead to > change in any reasonable timeframe, because everyone who could do the work > has an endless list of bigger fish to fry. I strongly encourage you

Re: Cassandra Needs to Grow Up by Version Five!

2018-02-22 Thread Oleksandr Shulgin
On Thu, Feb 22, 2018 at 9:50 AM, Eric Plowe wrote: > Cassandra, hard to use? I disagree completely. With that said, there are > definitely deficiencies in certain parts of the documentation, but nothing > that is a show stopper. True, there are no show-stoppers from the

RE: Cluster Repairs 'nodetool repair -pr' Cause Severe Increase inRead Latency After Shrinking Cluster

2018-02-22 Thread Fd Habash
“ data was allowed to fully rebalance/repair/drain before the next node was taken off?” -- Judging by the messages, the decomm was healthy. As an example StorageService.java:3425 - Announcing that I have left the ring for 3ms

Tracing cql code being run through the drive

2018-02-22 Thread Jonathan Baynes
Hi Community, Can anyone help me understand what class's id need to set logging on , if I want to capture the cql commands being run through the driver, similar to how profiler (MSSQL) would work? I need to see what's being run, and if the query is actually getting to cassandra? Has anyone

Re: Tracing cql code being run through the drive

2018-02-22 Thread Lucas Benevides
I don't know if it you help you, but when the debug log is turned on, it displays the slow queries. To consider slow, the parameter read_request_timeout_in_ms is considered. Maybe if you decrease it, you can monitor your queries, with $tail -F debug.log Just an idea, I've never made it. Surely

Re: Initializing a multiple node cluster (multiple datacenters)

2018-02-22 Thread Jonathan Haddad
If it's a new cluster, there's no need to disable auto_bootstrap. That setting prevents the first node in the second DC from being a replica for all the data in the first DC. If there's no data in the first DC, you can skip a couple steps and just leave it on. Leave it on, and enjoy your

Re: Cluster Repairs 'nodetool repair -pr' Cause Severe IncreaseinRead Latency After Shrinking Cluster

2018-02-22 Thread Carl Mueller
Your partition sizes aren't ridiculous... kinda big cells if there are 4 cells and 12 MB partitions, but still I don't think that is ludicrous. Whelp, I'm out of ideas from my "pay grade". Honestly, with AZ/racks you should have theoretically might have been able to take the nodes off

Re: Secondary Indexes C* 3.0

2018-02-22 Thread DuyHai Doan
Read this: http://www.doanduyhai.com/blog/?p=13191 On Thu, Feb 22, 2018 at 6:44 PM, Akash Gangil wrote: > To provide more context, I was going through this > https://docs.datastax.com/en/cql/3.3/cql/cql_using/useWhenIndex.html# > useWhenIndex__highCardCol > > On Thu,

Re: Initializing a multiple node cluster (multiple datacenters)

2018-02-22 Thread Jean Carlo
Hi jonathan Thank you for the answer. Do you know where to look to understand why this works. As i understood all the node then will chose ramdoms tokens. How can i assure the correctness of the ring? So as you said. Under the condition that there.is no data in the cluster. I can initialize a

Secondary Indexes C* 3.0

2018-02-22 Thread Akash Gangil
Hi, I was wondering if there are recommendations around the cardinality of secondary indexes. As I understand an index on a column with many distinct values will be inefficient. Is it because the index would only direct me to the specfic sstable, but then it sequentially searches for the target

Re: Secondary Indexes C* 3.0

2018-02-22 Thread Akash Gangil
To provide more context, I was going through this https://docs.datastax.com/en/cql/3.3/cql/cql_using/useWhenIndex.html#useWhenIndex__highCardCol On Thu, Feb 22, 2018 at 9:35 AM, Akash Gangil wrote: > Hi, > > I was wondering if there are recommendations around the

Initializing a multiple node cluster (multiple datacenters)

2018-02-22 Thread Jean Carlo
Hello I would like to clarify this, In order to initialize a cassandra multi dc cluster, without data. If I follow the documentation datastax https://docs.datastax.com/en/cassandra/2.1/cassandra/initialize/initializeMultipleDS.html It says - auto_bootstrap: false (Add this setting

Re: Initializing a multiple node cluster (multiple datacenters)

2018-02-22 Thread Jonathan Haddad
Kenneth, if you want to take the JIRA, feel free to self-assign it to yourself and put up a pull request or patch, and I'll review. I'd be very happy to get more people involved in the docs. On Thu, Feb 22, 2018 at 12:56 PM Kenneth Brotman wrote: > That

RE: Initializing a multiple node cluster (multiple datacenters)

2018-02-22 Thread Kenneth Brotman
I will heavy lift the docs for a while, do my Slender Cassandra reference project and then I’ll try to find one or two areas where I can contribute code to get going on that. I have read the section on contributing before I start. I’ll self-assign the JIRA right now. Kenneth Brotman

Re: Initializing a multiple node cluster (multiple datacenters)

2018-02-22 Thread Jon Haddad
Great question. Unfortunately, our OSS docs lack a step by step process on how to add a DC, I’ve created a JIRA to do that: https://issues.apache.org/jira/browse/CASSANDRA-14254 The datastax docs are pretty good for this though:

Re: Initializing a multiple node cluster (multiple datacenters)

2018-02-22 Thread Jean Carlo
Hi Jonathan Yes I do think this is a good idea about the doc. About the clarification, this is still true for the 2.1 ? We are planing upgrading to the 3.1 but not in the next months. We will stick for few more months on the 2.1. I believe this is true also for the 2.1 but I would like to

Re: Initializing a multiple node cluster (multiple datacenters)

2018-02-22 Thread Jon Haddad
In 2.1 token allocation is random, and the distribution doesn’t work as nicely. Everything else is the same. Do not use 3.1. Under any circumstances. Guessing that’s a typo but I just want to be sure. Jon > On Feb 22, 2018, at 1:45 PM, Jean Carlo wrote: > > Hi

RE: Initializing a multiple node cluster (multiple datacenters)

2018-02-22 Thread Kenneth Brotman
That information would have saved me time too. Thanks for making a JIRA for it Jon. Perhaps this is a good JIRA for me to begin with. Kenneth Brotman From: Jon Haddad [mailto:jonathan.had...@gmail.com] On Behalf Of Jon Haddad Sent: Thursday, February 22, 2018 11:11 AM To: user

Re: Initializing a multiple node cluster (multiple datacenters)

2018-02-22 Thread Oleksandr Shulgin
On Thu, Feb 22, 2018 at 5:36 PM, Jean Carlo wrote: > Hello > > I would like to clarify this, > > In order to initialize a cassandra multi dc cluster, without data. If I > follow the documentation datastax > >

Re: Initializing a multiple node cluster (multiple datacenters)

2018-02-22 Thread Oleksandr Shulgin
On Thu, Feb 22, 2018 at 8:11 PM, Jon Haddad wrote: > Great question. Unfortunately, our OSS docs lack a step by step process > on how to add a DC, I’ve created a JIRA to do that: > https://issues.apache.org/jira/browse/CASSANDRA-14254 > Thanks. I'd love to contribute as

Re: Initializing a multiple node cluster (multiple datacenters)

2018-02-22 Thread Oleksandr Shulgin
On Thu, Feb 22, 2018 at 5:42 PM, Jonathan Haddad wrote: > If it's a new cluster, there's no need to disable auto_bootstrap. > True. > That setting prevents the first node in the second DC from being a replica > for all the data in the first DC. > Not sure where did you

RE: Cluster Repairs 'nodetool repair -pr' Cause Severe IncreaseinRead Latency After Shrinking Cluster

2018-02-22 Thread Fd Habash
One more observation … When we compare read latencies between non-prod (where nodes were removed) to prod clusters, even though the node load as measure by size of /data dir is similar, yet the read latencies are 5 times slower in the downsized non-prod cluster. The only difference we see is