Cassandra is consuming a lot of disk space

2016-01-12 Thread Rahul Ramesh
We have a 2 node Cassandra cluster with a replication factor of 2. The load factor on the nodes is around 350Gb Datacenter: Cassandra == Address RackStatus State LoadOwns Token -5072018636360415943 172.31.7.91 rack1 Up Normal 328.5 GB

Re: Sorting & pagination in apache cassandra 2.1

2016-01-12 Thread Jonathan Haddad
The clustering keys determine the sorting of rows within a partition. The partitions within a file are sorted by their token (usually computed by applying the murmur 3 hash to the partition key). If you are using a version of Cassandra < 3.0, you'll need to maintain your own materialized view

Re: Cassandra is consuming a lot of disk space

2016-01-12 Thread Kevin O'Connor
Have you tried restarting? It's possible there's open file handles to sstables that have been compacted away. You can verify by doing lsof and grepping for DEL or deleted. If it's not that, you can run nodetool cleanup on each node to scan all of the sstables on disk and remove anything that it's

Re: Sorting & pagination in apache cassandra 2.1

2016-01-12 Thread anuja jain
I understand the meaning of SSTable but whats the reason behind sorting the table on the basis of int columns first.. Is there any data type preference in cassandra? Also What is the alternative to creating materialised views if my cassandra version is prior to 3.0 (specifically 2.1) and which is

Re: In UJ status for over a week trying to rejoin cluster in Cassandra 3.0.1

2016-01-12 Thread DuyHai Doan
What is your Cassandra version ? In earlier versions there was some issues with streaming that can make the joining process stuck. On Mon, Jan 11, 2016 at 6:57 AM, Carlos A wrote: > Hello all, > > I have a small dev environment with 4 machines. One of them, I had it >

Re: what consistency level should I set when using IF NOT EXIST or UPDATE IF statements ?

2016-01-12 Thread DuyHai Doan
There are 2 levels of consistency levels you can define on your query when using Lightweight Transaction: - one for the Paxos round: SERIAL or LOCAL_SERIAL (which indeed corresponds to QUORUM/LOCAL_QUORUM but named differently so people do not get confused) - one for the consistency of the

Re: ClosedChannelExcption while nodetool repair

2016-01-12 Thread Paulo Motta
You may be running into https://issues.apache.org/jira/browse/CASSANDRA-10961, which will be fixed in 2.2.5. In the meantime, you may replace your cassandra jar with a snapshot version available in that issue. 2016-01-12 10:38 GMT-03:00 Jan Kesten : > Hi, > > I have some

Re: Modeling contact list, plain table or List

2016-01-12 Thread DuyHai Doan
--> Why don't you do: DELETE FROM user_contact WHERE userid=xxx AND contactname= ? Answer : Because a contact name can be duplicated. Or should I force unique contact names? In this case, add contactid as extra clustering column to guarantee unicity for your contact. The delete query

ClosedChannelExcption while nodetool repair

2016-01-12 Thread Jan Kesten
Hi, I have some problems recently on my cassandra cluster. I am running 12 nodes with 2.2.4 and while repairing with a plain "nodetool repair". In system.log I can find ERROR [STREAM-IN-/172.17.2.233] 2016-01-08 08:32:38,327 StreamSession.java:524 - [Stream #5f96e8b0-b5e2-11e5-b4da-4321ac9959ef]

[RELEASE] Apache Cassandra 3.2 released

2016-01-12 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra version 3.2. Apache Cassandra is a fully distributed database. It is the right choice when you need scalability and high availability without compromising performance. http://cassandra.apache.org/ Downloads of source and

Re: [RELEASE] Apache Cassandra 3.2 released

2016-01-12 Thread Jake Luciani
Note: I made a mistake saying this is a bug fix release, it's a feature release that includes bugfixes. On Tue, Jan 12, 2016 at 8:46 AM, Jake Luciani wrote: > > The Cassandra team is pleased to announce the release of Apache Cassandra > version 3.2. > > Apache Cassandra is a

Re: Too many compactions, maybe keyspace system?

2016-01-12 Thread Robert Coli
On Mon, Jan 11, 2016 at 9:12 PM, Shuo Chen wrote: > I have a assumption that, lots of pending compaction tasks jam the memory > and raise full gc. The full chokes the process and slows down compaction. > And this causes more pending compaction tasks and more pressure on

Re: Sorting & pagination in apache cassandra 2.1

2016-01-12 Thread Carlos Alonso
Hi Anuja. Cassandra saves records on disk sorted by the clustering column. In this case you haven't selected any but it looks like is picking birth_year as the clustering column. I don't know which is the clustering column selection algorithm though (maybe alphabetically by name?). Regards

Re: Modeling contact list, plain table or List

2016-01-12 Thread DuyHai Doan
1)SELECT all rows from user_contact excluding the one that the user wants to get rid of. 2) DELETE all the user_contact rows for that particular user . 3) INSERT the result of 1). --> Why don't you do: DELETE FROM user_contact WHERE userid=xxx AND contactname= ? The Materialized View will

Re: In UJ status for over a week trying to rejoin cluster in Cassandra 3.0.1

2016-01-12 Thread DuyHai Doan
Oh, sorry, did not notice the version in the title. Did you check the system.log to verify if there isn't any Exception related to data streaming ? What is the output of "nodetool tpstats" ? On Tue, Jan 12, 2016 at 1:00 PM, DuyHai Doan wrote: > What is your Cassandra

Re: Modeling contact list, plain table or List

2016-01-12 Thread I PVP
--> Why don't you do: DELETE FROM user_contact WHERE userid=xxx AND contactname= ? Answer : Because a contact name can be duplicated. Or should I force unique contact names? Overall , the challenge seems to be addressed , with some trade of on the "ordering by contact name”. If, at the

Re: Recommendations for an embedded Cassandra and Unit Tests

2016-01-12 Thread DuyHai Doan
"What I'm noticing with these projects is that they don't handle CQL files properly" --> your concern is very legit. But handling CQL files properly is very complex, let me explain the reasons. A naive solution if you want to handle CQL syntax is to re-use the ANTLR grammar file here:

Re: [Typo correction] Is it good for performance to put rows that are of different types but are always queried together in the same table partition?

2016-01-12 Thread Carlos Alonso
Why can't you have something like this? CREATE TABLE t ( p INT, q1 INT, q2 UUID, c1 INT, c2 TEXT, PRIMARY KEY (p, q1, q2) ) Sounds the simplest solution. Carlos Alonso | Software Engineer | @calonso On 12 January 2016 at 18:27, Bamoqi

Re: electricity outage problem

2016-01-12 Thread daemeon reiydelle
This happens when there is insufficient time for nodes coming up to join a network. It takes a few seconds for a node to come up, e.g. your seed node. If you tell a node to join a cluster you can get this scenario because of high network utilization as well. I wait 90 seconds after the first (i.e.

Re: Upgrade from 2.0.x to 2.2.x documentation missing

2016-01-12 Thread Michael Shuler
On 01/12/2016 01:07 AM, Amit Singh F wrote: > We are currently at *Cassandra 2.0.14* in production and since it going > to be EOL soon so we are planning to upgrade it to *Cassandra 2.2.4* > (http://cassandra.apache.org/download/) which is the currently > production ready version. While doing some

electricity outage problem

2016-01-12 Thread Adil
Hi, we have two DC with 5 nodes in each cluster, yesterday there was an electricity outage causing all nodes down, we restart the clusters but when we run nodetool status on DC1 it results that some nodes are DN, the strange thing is that running the command from diffrent node in DC1 doesn't give

Seed Private / Public Broadcast IP

2016-01-12 Thread Asher Newcomer
HI All, I am currently running a multi-region setup in AWS. I have a single cluster across two datacenters in different regions. In order to communicate cross-region in AWS, I have my broadcast_address set to public IPs and my listen_address set to the instance's private IP. I believe that this

Cassandra 1.2.19 and Java 8

2016-01-12 Thread Tim Heckman
Hello, We still have an installation of Cassandra on the 1.2.19 release, running on Java 7. We do plan on upgrading to a newer version, but in the mean time there has been some questions internally about running 1.2 on Java 8 until the upgrade can be fully completed. I seem to remember speaking

Re: Cassandra 1.2.19 and Java 8

2016-01-12 Thread Robert Coli
On Tue, Jan 12, 2016 at 2:31 PM, Tim Heckman wrote: > We still have an installation of Cassandra on the 1.2.19 release, > running on Java 7. We do plan on upgrading to a newer version, but in > the mean time there has been some questions internally about running > 1.2 on Java

Re: Cassandra 1.2.19 and Java 8

2016-01-12 Thread Michael Shuler
On 01/12/2016 04:41 PM, Robert Coli wrote: > On Tue, Jan 12, 2016 at 2:31 PM, Tim Heckman > wrote: > > We still have an installation of Cassandra on the 1.2.19 release, > running on Java 7. We do plan on upgrading to a newer version, but in

Re: [Typo correction] Is it good for performance to put rows that are of different types but are always queried together in the same table partition?

2016-01-12 Thread Bamoqi
I over-simplified the original example. In the real model I cannot just merge the row types. Suppose create table t1( p int, q1 int, c1 int, primary key( p, q1 ) ) create table t2( p int, q2 uuid, c2 text, primary key(

Re: Sorting & pagination in apache cassandra 2.1

2016-01-12 Thread Robert Coli
On Mon, Jan 11, 2016 at 11:30 PM, anuja jain wrote: > 1 more question, what does it mean by "cassandra inherently sorts data"? > SSTable = Sorted Strings Table. It doesn't contain "Strings" anymore, really, but that's a hint.. :) =Rob

Repair with "-pr" and vnodes

2016-01-12 Thread Roman Tkachenko
Hey guys, The documentation for the "-pr" repair option says it repairs only the first range returned by the partitioner. However, with vnodes a node owns a lot of small ranges. Does that mean that if I run rolling "nodetool repair -pr" on the cluster, a whole bunch of ranges remain un-repaired?

Re: Repair with "-pr" and vnodes

2016-01-12 Thread Robert Coli
On Tue, Jan 12, 2016 at 3:46 PM, Roman Tkachenko wrote: > The documentation for the "-pr" repair option says it repairs only the > first range returned by the partitioner. However, with vnodes a node owns a > lot of small ranges. > > Does that mean that if I run rolling

Re: electricity outage problem

2016-01-12 Thread Jack Krupansky
Sometimes you may have to clear out the saved Gossip state: https://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_gossip_purge.html Note the instruction about bringing up the seed nodes first. Normally seed nodes are only relevant when initially joining a node to a cluster (and then

Re: Slow performance after upgrading from 2.0.9 to 2.1.11

2016-01-12 Thread Peddi, Praveen
Thanks Jeff for your reply. Sorry for delayed response. We were running some more tests and wanted to wait for the results. So basically we saw higher CPU with 2.1.11 was higher compared to 2.0.9 (see below) for the same exact load test. Memory spikes were also aggressive on 2.1.11. So we