Re: select query on entire primary key returning more than one row in result

2016-06-14 Thread Atul Saroha
After further debug, this issue is found in in-memory memtable as doing nodetool flush + compact resolve the issue. And there is no batch write used for this table which is showing issue. Table properties: WITH CLUSTERING ORDER BY (f_name ASC) > AND bloom_filter_fp_chance = 0.01 > AND

Re: Streaming from 1 node only when adding a new DC

2016-06-14 Thread kurt Greaves
What version of Cassandra are you using? Also what command are you using to run the rebuilds? Are you using vnodes? On 13 June 2016 at 09:01, Fabien Rousseau wrote: > Hello, > > We've tested adding a new DC from an existing DC having 3 nodes and RF=3 > (ie all nodes have

[RELEASE] Apache Cassandra 3.0.7 released

2016-06-14 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra version 3.0.7. Apache Cassandra is a fully distributed database. It is the right choice when you need scalability and high availability without compromising performance. http://cassandra.apache.org/ Downloads of source

[RELEASE] Apache Cassandra 3.7 released

2016-06-14 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra version 3.7. Apache Cassandra is a fully distributed database. It is the right choice when you need scalability and high availability without compromising performance. http://cassandra.apache.org/ Downloads of source and

Re: Tick Tock version numbers

2016-06-14 Thread Tyler Hobbs
On Mon, Jun 13, 2016 at 11:59 AM, Francisco Reyes wrote: > > > Can I upgrade them to 3.6 from 3.2? Or is it advisable to upgrade to each > intermediary version? > You can (and should) upgrade directly to 3.6 or 3.7. The 3.7 release is just 3.6 + bugfixes. > > Based on what

Re: Installing Cassandra from Tarball

2016-06-14 Thread Tyler Hobbs
On Mon, Jun 13, 2016 at 11:49 AM, Bhuvan Rawal wrote: > > WARN 15:41:58 Cassandra server running in degraded mode. Is swap >> disabled? : true, Address space adequate? : true, nofile limit adequate? >> : false, nproc limit adequate? : false >> > You need to disable swap

Re: select query on entire primary key returning more than one row in result

2016-06-14 Thread Tyler Hobbs
Is 'id' your partition key? I'm not familiar with the stratio indexes, but it looks like the primary key columns are both indexed. Perhaps this is related? On Tue, Jun 14, 2016 at 1:25 AM, Atul Saroha wrote: > After further debug, this issue is found in in-memory

Re: select query on entire primary key returning more than one row in result

2016-06-14 Thread Siddharth Verma
id is partition key, f_name is clustering key We weren't querying on lucene indexes. lucene index is on id, and f_d_name (another column). We were facing this issue on production in one column family, due to which we had to downgrade to 3.0.3

Re: select query on entire primary key returning more than one row in result

2016-06-14 Thread Atul Saroha
Hi Tyler, This issue is mainly visible for tables having static columns, still investigating. We will try to test after removing lucene index but I don’t think this plug-in could led to change in behaviour of cassandra write to table's memtable.

Re: Installing Cassandra from Tarball

2016-06-14 Thread Steve Anderson
I think you’re right Tyler; the warning does not appear after making the changes suggested by Bhuvan. Steve — "Surely, those who believe, those who are Jewish, the Christians, and the converts; anyone who (1) believes in God, (2) believes in the Last Day, and (3) leads a righteous life, will

Re: Installing Cassandra from Tarball

2016-06-14 Thread Steve Anderson
Awesone, thanks Bhuvan! I have not worried about the JMX warning at this stage. There were two other warnings, but I assume these are due to the size of my Amazon Linux Image (and are not worth worrying about at this stage). WARN 17:24:20 Small commitlog volume detected at

Re: select query on entire primary key returning more than one row in result

2016-06-14 Thread Joel Knighton
There's some precedent for similar issues with static columns in 3.5 with https://issues.apache.org/jira/browse/CASSANDRA-11513 - a deterministic (or somewhat deterministic) path for reproduction would help narrow the issue down farther. I've played around locally with similar schemas (sans the

Re: Cassandra monitoring

2016-06-14 Thread Arun Ramakrishnan
Thanks Jonathan. Out of curiosity, does opscenter support some later version of cassandra that is not OSS ? Well, the most minimal requirement is that, I want to be able to monitor for cluster health and hook this info to some alerting platform. We are AWS heavy. We just really heavily on AWS

Re: Cassandra monitoring

2016-06-14 Thread Jonathan Haddad
OpsCenter going forward is limited to datastax enterprise versions. I know a lot of people like DataDog, but I haven't used it. Maybe other people on the list can speak from recent first hand experience on it's pros and cons. On Tue, Jun 14, 2016 at 1:20 PM Arun Ramakrishnan <

Re: Streaming from 1 node only when adding a new DC

2016-06-14 Thread Fabien Rousseau
We've tested with C* 2.1.14 version Yes VNodes with 256 tokens Once all the nodes in dc2 are added, schema is modified to have RF=3 in dc1 and RF=3 in dc2. Then on each nodes of dc2: nodetool rebuild dc1 Le 14 juin 2016 10:39, "kurt Greaves" a écrit : > What version of

Re: select query on entire primary key returning more than one row in result

2016-06-14 Thread Bhuvan Rawal
Jira CASSANDRA-12003 Has been created for the same. On Tue, Jun 14, 2016 at 11:54 PM, Atul Saroha wrote: > Hi Tyler, > > This issue is mainly visible for tables having static columns, still > investigating. > We

Cassandra monitoring

2016-06-14 Thread Arun Ramakrishnan
What are the options for a very small and nimble startup to do keep a cassandra cluster running well oiled. We are on AWS. We are interested in a monitoring tool and potentially also cluster management tools. We are currently on apache cassandra 3.7. We were hoping the datastax opscenter would be

Re: How to print out the metrics information, like compaction and garbage collection?

2016-06-14 Thread Otis Gospodnetić
Hi Jun, Here's a tool for dumping JMX contents: https://github.com/sematext/jmxc Here's a tool/service for monitoring Cassandra: https://sematext.com/spm/integrations/cassandra-monitoring/ Otis -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support

Re: select query on entire primary key returning more than one row in result

2016-06-14 Thread Bhuvan Rawal
I can reproduce CASSANDRA-11513 locally on 3.5, possible duplicate. On Wed, Jun 15, 2016 at 12:29 AM, Joel Knighton wrote: > There's some precedent for similar issues with static columns in 3.5 with >

Re: select query on entire primary key returning more than one row in result

2016-06-14 Thread Bhuvan Rawal
Joel, If we look at the schema carefully: CREATE TABLE test0 ( pk int, a int, b text, s text static, PRIMARY KEY (*pk, a)* ); and filtering is performed on clustering column a and its not a static column: select * from test0 where pk=0 and a=2; On Wed, Jun 15, 2016 at

Re: select query on entire primary key returning more than one row in result

2016-06-14 Thread Bhuvan Rawal
I have verified this issue to be fixed in 3.6 and 3.7. And the issue mentioned on this thread is fixed as well. On Wed, Jun 15, 2016 at 12:43 AM, Bhuvan Rawal wrote: > Joel, > > If we look at the schema carefully: > > CREATE TABLE test0 ( > pk int, > a int, > b

Re: select query on entire primary key returning more than one row in result

2016-06-14 Thread Joel Knighton
It doesn't seem to be an exact duplicate - CASSANDRA-11513 relies on you selecting a static column, which you weren't doing in the reported issue. That said, I haven't looked too closely. On Tue, Jun 14, 2016 at 2:07 PM, Bhuvan Rawal wrote: > I can reproduce CASSANDRA-11513

Re: Cassandra monitoring

2016-06-14 Thread Jonathan Haddad
Depends what you want to monitor. I wouldn't use a lesser version of Cassandra for OpsCenter, it doesn't give you a ton you can't get elsewhere and it's not ever going to support OSS > 2.1, so you kind of limit yourself to a pretty old version of Cassandra for a non-good reason. What else do you

RE: How to print out the metrics information, like compaction and garbage collection?

2016-06-14 Thread Jun Wu
Hi Otis, Thank you so much for the reply. I do appreciate it. Actually, I've tried the sematext spm few days ago:) It works well and is easy to deploy. However, in the monitoring output figure for different metrics, the interval is 1 minute, which is longer than I want. What I want is

Re: Cassandra monitoring

2016-06-14 Thread Michał Łowicki
My team ended up with Diamond / StatsD / Graphite / Grafana (more background in medium.com/@mlowicki/alternatives-to-datastax-opscenter-8ad893efe063). We're relying on such stack heavily in other projects and our infra in general. On Tue, Jun 14, 2016 at 10:29 PM, Jonathan Haddad

Re: Data lost in Cassandra 3.5 single instance via Erlang driver

2016-06-14 Thread linbo liao
I am not sure, but looks it will cause the update other than insert. If it is true, the only way is request includes IF NOT EXISTS, inform the client it failed? Thanks, Linbo 2016-06-15 10:59 GMT+08:00 Ben Slater : > Is it possible that your pub_timestamp values are

Re: select query on entire primary key returning more than one row in result

2016-06-14 Thread Bhuvan Rawal
Joel, Id rather thank you for naming 11513 earlier in the mail, I would have been lost in the code for a much longer time otherwise. Repeating what Tianshi mentioned in 11513 - "*Cassandra community is awesome! Should buy you a beer, Joel."* :) On Wed, Jun 15, 2016 at 6:01 AM, Joel Knighton

Data lost in Cassandra 3.5 single instance via Erlang driver

2016-06-14 Thread linbo liao
Hi, I use Erlang driver to send data to Cassandra, do testing at local environment meet data lost issue. I have no idea what step is wrong. *Environment:* 1. Ubuntu 12.04 LTS x64bit 2. Cassandra 3.5 single instance, not a cluster, installed via the offical installation document, and didn't

Re: Data lost in Cassandra 3.5 single instance via Erlang driver

2016-06-14 Thread Paul Fife
If pub_timestamp could possibly match I'd suggest making it a timeuuid type instead. With the above schema it's not a failure or data loss if the timestamp is duplicated - your writes all probably made it - the duplicates just got overwritten. On Tue, Jun 14, 2016 at 9:40 PM, linbo liao

Re: Data lost in Cassandra 3.5 single instance via Erlang driver

2016-06-14 Thread Ben Slater
Is it possible that your pub_timestamp values are colliding (which would result in an update rather than an insert)? On Wed, 15 Jun 2016 at 12:55 linbo liao wrote: > Hi, > > I use Erlang driver to send data to Cassandra, do testing at local > environment meet data lost issue.

Re: Data lost in Cassandra 3.5 single instance via Erlang driver

2016-06-14 Thread Alain Rastoul
On 15/06/2016 06:40, linbo liao wrote: I am not sure, but looks it will cause the update other than insert. If it is true, the only way is request includes IF NOT EXISTS, inform the client it failed? Thanks, Linbo Hi Linbo, +1 with what Ben said, timestamp has a millisecond precision and is

Re: Spark Memory Error - Not enough space to cache broadcast

2016-06-14 Thread Gaurav Bhatnagar
try setting the option --driver-memory 4G On Tue, Jun 14, 2016 at 3:52 PM, Ben Slater wrote: > A high level shot in the dark but in our testing we found Spark 1.6 a lot > more reliable in low memory situations (presumably due to >

Re: select query on entire primary key returning more than one row in result

2016-06-14 Thread Bhuvan Rawal
Joel, Thanks for your reply, I have checked and found that the behavior is same in case of CASSANDRA-11513 . I have verified this behavior (for both 11513 & 12003) to occur in case of 3.4 & 3.5. They both don't occur in 3.0.4, 3.6 & 3.7.

Re: Spark Memory Error - Not enough space to cache broadcast

2016-06-14 Thread Cassa L
Hi, I would appreciate any clue on this. It has become a bottleneck for our spark job. On Mon, Jun 13, 2016 at 2:56 PM, Cassa L wrote: > Hi, > > I'm using spark 1.5.1 version. I am reading data from Kafka into Spark and > writing it into Cassandra after processing it. Spark

Re: select query on entire primary key returning more than one row in result

2016-06-14 Thread Joel Knighton
Great work, Bhuvan - I sat down after work to look at this more carefully. For a short summary, you are correct. For a longer summary, I initially thought the reproduction you provided would not run into the issue from 3.4/3.5 because it didn't select any static columns, which meant that it

Re: Spark Memory Error - Not enough space to cache broadcast

2016-06-14 Thread Ben Slater
A high level shot in the dark but in our testing we found Spark 1.6 a lot more reliable in low memory situations (presumably due to https://issues.apache.org/jira/browse/SPARK-1). If it’s an option, probably worth a try. Cheers Ben On Wed, 15 Jun 2016 at 08:48 Cassa L