SizeTiered to Leveled Compaction runs out of RAM

2016-01-05 Thread Chris Elsmore
We run a fairly small production Cassandra 2.2.4 cluster with 5 nodes on Rackspace VMs, (4 cores, 4GB RAM, SSD backed) and whilst these nodes are on the small side, day to day it has kept up with our workload fine. We currently use SizeTieredCompactionStrategy and want to move to the

Re: Data Modeling: Partition Size and Query Efficiency

2016-01-05 Thread Jim Ancona
Thanks for responding! My natural partition key is a customer id. Our customers have widely varying amounts of data. Since the vast majority of them have data that's small enough to fit in a single partition, I'd like to avoid imposing unnecessary overhead on the 99% just to avoid issues with the

Re: Data Modeling: Partition Size and Query Efficiency

2016-01-05 Thread Nate McCall
> > > In this case, 99% of my data could fit in a single 50 MB partition. But if > I use the standard approach, I have to split my partitions into 50 pieces > to accommodate the largest data. That means that to query the 700 rows for > my median case, I have to read 50 partitions instead of one. >

Re: Best way to get Cassandra status in Bash

2016-01-05 Thread Giovanni Usai
Hello, thanks to everyone for the fast replies! Unfortunately, since yesterday afternoon I have been assigned to a more urgent task, so I will implement the solutions you proposed in the spare time and I will let you know the outcomes asap (hopefully in few weeks). Thanks a lot again! Best

Re: Requesting some details for my use case

2016-01-05 Thread Jack Krupansky
Bear in mind that you won't be able to merely "tune" your schema - you will need to completely redesign your data model. Step one is to look at all of the queries you need to perform and get a handle on what flat, denormalized data model they will need to execute performantly in a NoSQL database.

Re: Data Modeling: Partition Size and Query Efficiency

2016-01-05 Thread Jack Krupansky
Jim, I don't quite get why you think you would need to query 50 partitions to return merely hundreds or thousands of rows. Please elaborate. I mean, sure, for that extreme 100th percentile, yes, you would query a lot of partitions, but for the 90th percentile it would be just one. Even the 99th

Revisit Cassandra EOL Policy

2016-01-05 Thread Anuj Wadehra
Hi, As per my understanding, a Cassandra version n is implicitly declared EOL when two major versions are released after the version n i.e. when version n + 2 is released. I think the EOL policy must be revisted in interest of the expanding Cassandra user base.  Concerns with current EOL

Re: compaction_throughput_mb_per_sec

2016-01-05 Thread Ken Hancock
As to why I think it's cluster-wide, here's what the documentation says: https://docs.datastax.com/en/cassandra/1.2/cassandra/configuration/configCassandra_yaml_r.html compaction_throughput_mb_per_sec

Re: Data Modeling: Partition Size and Query Efficiency

2016-01-05 Thread Jim Ancona
Hi Jack, Thanks for your response. My answers inline... On Tue, Jan 5, 2016 at 11:52 AM, Jack Krupansky wrote: > Jim, I don't quite get why you think you would need to query 50 partitions > to return merely hundreds or thousands of rows. Please elaborate. I mean, >

Re: Data Modeling: Partition Size and Query Efficiency

2016-01-05 Thread Jim Ancona
Hi Nate, Yes, I've been thinking about treating customers as either small or big, where "small" ones have a single partition and big ones have 50 (or whatever number I need to keep sizes reasonable). There's still the problem of how to handle a small customer who becomes too big, but that will

Re: compaction_throughput_mb_per_sec

2016-01-05 Thread Robert Coli
On Tue, Jan 5, 2016 at 6:50 AM, Ken Hancock wrote: > As to why I think it's cluster-wide, here's what the documentation says: > Do you see "system" used in place of "cluster" anywhere else in the docs? I think you are correct that the docs should standardize on

Re: compaction_throughput_mb_per_sec

2016-01-05 Thread Ken Hancock
Will do. I searched the doc for additional usage of the term "system" commitlog_segment_size_in_mb refers to "every table in the system" concurrent_writes talks about CPU cores "in your system" That's it for "system" other than the compaction_throughput_mb_per_sec which refers to "across the

Cassandra Performance on a Single Machine

2016-01-05 Thread Anurag Khandelwal
Hi,I’ve been benchmarking Cassandra to get an idea of how the performance scales with more data on a single machine. I just wanted to get some feedback to whether these are the numbers I should expect.The benchmarks are quite simple — I measure the latency and throughput for two kinds of

Re: Requesting some details for my use case

2016-01-05 Thread Bhuvan Rawal
I understand, Ravi, we have our application layers well defined. The major changes will be in database access layers and entities will be changed. Schema will be modified to tune the efficiency of the data store chosen. We have been using mongo as a cache for a long time now, but as its a

RE: Basic query in setting up secure inter-dc cluster

2016-01-05 Thread Singh, Abhijeet
Security is a very wide concept. What exactly do you want to achieve ? From: Ajay Garg [mailto:ajaygargn...@gmail.com] Sent: Wednesday, January 06, 2016 11:27 AM To: user@cassandra.apache.org Subject: Basic query in setting up secure inter-dc cluster Hi All. We have a 2*2 cluster deployed, but

Re: opscenter doesn't work with cassandra 3.0

2016-01-05 Thread Wills Feng
Hi, when I try to connect cassandra3.0 cluster in opscenter, I experienced an error in opscenter log, see below: ''Control connection failed to connect, shutting down Cluster: ('Unable to connect to any servers', {u'54.187.25.239': ProtocolError("Unexpected response during Connection setup:

Basic query in setting up secure inter-dc cluster

2016-01-05 Thread Ajay Garg
Hi All. We have a 2*2 cluster deployed, but no security as of now. As a first stage, we wish to implement inter-dc security. Is it possible to enable security one machine at a time? For example, let's say the machines are DC1M1, DC1M2, DC2M1, DC2M2. If I make the changes JUST IN DC2M2 and

Re: Data Modeling: Partition Size and Query Efficiency

2016-01-05 Thread Jonathan Haddad
You could keep a "num_buckets" value associated with the client's account, which can be adjusted accordingly as usage increases. On Tue, Jan 5, 2016 at 2:17 PM Jim Ancona wrote: > On Tue, Jan 5, 2016 at 4:56 PM, Clint Martin < > clintlmar...@coolfiretechnologies.com>

Re: Data Modeling: Partition Size and Query Efficiency

2016-01-05 Thread Jim Ancona
On Tue, Jan 5, 2016 at 4:56 PM, Clint Martin < clintlmar...@coolfiretechnologies.com> wrote: > What sort of data is your clustering key composed of? That might help some > in determining a way to achieve what you're looking for. > Just a UUID that acts as an object identifier. > > Clint > On Jan

Re: compaction_throughput_mb_per_sec

2016-01-05 Thread Jack Krupansky
I forwarded a comment to the docs team. It appears that they picked the language up from the cassandra.yaml file itself. Looking at use of system in that file, it seems that it usually means the node, the box running the node. -- Jack Krupansky On Tue, Jan 5, 2016 at 9:50 AM, Ken Hancock

Node stuck when joining a Cassandra 2.2.0 cluster

2016-01-05 Thread Herbert Fischer
We run a small Cassandra 2.2.0 cluster, with 5 nodes, on bare-metal servers and we are going to replace those nodes with other nodes. I planned to add all the new nodes first, one-by-one, and later remove the old ones, one-by-one. Although the first new node gets stuck when joining the cluster. I

Requesting some details for my use case

2016-01-05 Thread Bhuvan Rawal
Hi All, Im planning to shift from SQL database to a columnar nosql database, we have streamlined our choices to Cassandra and HBase. I would really appreciate if someone decent experience with both give me a honest comparison on below parameters (links to neutral benchmarks/blogs also

Re: Best way to get Cassandra status in Bash

2016-01-05 Thread Stephen Baynes
I did something like this in Perl. What you want to know is will the server respond to CQL, then it is ready to use. The Bash equivalent of what I did would be to use: cqlsh < /dev/null if $? ... Stephen On 4 January 2016 at 15:56, Giovanni Usai wrote: > Hello

Re: New node stuck on joining Cluster (Cassandra 2.2.0)

2016-01-05 Thread Herbert Fischer
Please ignore. On 5 January 2016 at 11:48, Herbert Fischer wrote: > We run a small Cassandra 2.2.0 cluster, with 5 nodes, on barebone servers > and we are going to replace those nodes with other nodes. I planned to add > all the new nodes first, one-by-one, and

New node stuck on joining Cluster (Cassandra 2.2.0)

2016-01-05 Thread Herbert Fischer
We run a small Cassandra 2.2.0 cluster, with 5 nodes, on barebone servers and we are going to replace those nodes with other nodes. I planned to add all the new nodes first, one-by-one, and later remove the old ones, one-by-one. Although the first new node gets stuck when joining the cluster. I

Re: Requesting some details for my use case

2016-01-05 Thread Bhuvan Rawal
*Thanks Jack* *for the detailed advice*. Yes it is a Java Application. We have a Denormalized view of our data already in place, we use it for storing it in MongoDB as a cache, however will get our hands dirty before implementation. We would like to have a single DB view. And replace MongoDB &

Re: Requesting some details for my use case

2016-01-05 Thread Jonathan Haddad
Sorry to nitpick, but Cassandra is not a columnar database. If you're looking for columnar because you have an analytics need, Cassandra is not what you want. If you've just made the same mistake that 99% of people make, well, now you know. Cassandra historically has been referred to as a

Re: Requesting some details for my use case

2016-01-05 Thread Jack Krupansky
DataStax has documented quite a few customers/case studies: http://www.datastax.com/resources/casestudies Materialized Views should be considered if you can go straight to 3.0, but you can always do the same synthesized views yourself in your app, which is current standard best practice anyways.

Re: Requesting some details for my use case

2016-01-05 Thread Bhuvan Rawal
Thanks for pointing out the typo Jonathan. Our use case is of Column Family. :) On Wed, Jan 6, 2016 at 2:38 AM, Jonathan Haddad wrote: > Sorry to nitpick, but Cassandra is not a columnar database. If you're > looking for columnar because you have an analytics need,

Re: Node stuck when joining a Cassandra 2.2.0 cluster

2016-01-05 Thread Robert Coli
On Tue, Jan 5, 2016 at 3:01 AM, Herbert Fischer < herbert.fisc...@crossengage.io> wrote: > We run a small Cassandra 2.2.0 cluster, with 5 nodes, on bare-metal > servers and we are going to replace those nodes with other nodes. I planned > to add all the new nodes first, one-by-one, and later

Re: Data Modeling: Partition Size and Query Efficiency

2016-01-05 Thread Clint Martin
What sort of data is your clustering key composed of? That might help some in determining a way to achieve what you're looking for. Clint On Jan 5, 2016 2:28 PM, "Jim Ancona" wrote: > Hi Nate, > > Yes, I've been thinking about treating customers as either small or big, >

Re: Requesting some details for my use case

2016-01-05 Thread Ravi Krishna
You are moving from a SQL database to C* ??? I hope you are aware of the differences between a nosql like C* and a RDBMS. To keep it short, the app has to change significantly. Please read documentation on differences between nosql and RDBMS. thanks. On Tue, Jan 5, 2016 at 6:20 AM, Bhuvan Rawal