3k sstables during a repair incremental !!

2016-02-10 Thread Jean Carlo
Hello guys! I am testing the repair inc in my custer cassandra. I am doing my test over these tables *CREATE TABLE pns_nonreg_bench.cf3* ( s text, sp int, d text, dp int, m map, t timestamp, PRIMARY KEY (s, sp, d, dp) ) WITH CLUSTERING ORDER BY (sp ASC, d

Select values from map with multiple key values in where clause

2016-02-10 Thread Matteo Rulli
Hello! I have a table like the following CREATE TABLE test ( my_id uuid, values map, PRIMARY KEY (my_id) )... and I would like to perform a query along these lines: SELECT * FROM test WHERE my_id = 670287fe-e080-42c3-9fae-7ffdc3309793 AND values CONTAINS KEY

Re: 3k sstables during a repair incremental !!

2016-02-10 Thread horschi
Hi Jean, which Cassandra version do you use? Incremental repair got much better in 2.2 (for us at least). kind regards, Christian On Wed, Feb 10, 2016 at 2:33 PM, Jean Carlo wrote: > Hello guys! > > I am testing the repair inc in my custer cassandra. I am doing my

Re: 3k sstables during a repair incremental !!

2016-02-10 Thread horschi
Hi Jean, we had the same issue, but on SizeTieredCompaction. During repair the number of SSTables and pending compactions were exploding. It not only affected latencies, at some point Cassandra ran out of heap. After the upgrade to 2.2 things got much better. regards, Christian On Wed, Feb

Re: 3k sstables during a repair incremental !!

2016-02-10 Thread Jean Carlo
Correction: *table cf3* *Space used (live): 697.03 MB* It happens that when I do repair -inc -par on theses tables, *cf3 got a pick of 3k sstables*. When the repair finish, it takes 30 min or more to finish all the compactations and return to 6 sstable. Saludos Jean Carlo "The best

Re: best ORM for cassandra

2016-02-10 Thread Jim Ancona
Recent versions of the Datastax Java Driver include an object mapping API that might work for you: http://docs.datastax.com/en/latest-java-driver/java-driver/reference/objectMappingApi.html Jim On Wed, Feb 10, 2016 at 4:29 AM, Nirmallya Mukherjee wrote: > I have heard of

Re: best ORM for cassandra

2016-02-10 Thread DuyHai Doan
For advanced object mapping you can look also at Achilles: www.achilles.io On Wed, Feb 10, 2016 at 3:21 PM, Jim Ancona wrote: > Recent versions of the Datastax Java Driver include an object mapping API > that might work for you: > >

Re: 3k sstables during a repair incremental !!

2016-02-10 Thread Jean Carlo
Hello Horschi, Yes I understand. Thx Best regards Jean Carlo "The best way to predict the future is to invent it" Alan Kay On Wed, Feb 10, 2016 at 3:00 PM, horschi wrote: > btw: I am not saying incremental Repair in 2.1 is broken, but ... ;-) > > On Wed, Feb 10, 2016

Re: 3k sstables during a repair incremental !!

2016-02-10 Thread Jean Carlo
Hi Horschi !!! I have the 2.1.12. But I think it is something related to Level compaction strategy. It is impressive that we passed from 6 sstables to 3k sstable. I think this will affect the latency on production because the number of compactions going on Best regards Jean Carlo "The best

Re: 3k sstables during a repair incremental !!

2016-02-10 Thread horschi
btw: I am not saying incremental Repair in 2.1 is broken, but ... ;-) On Wed, Feb 10, 2016 at 2:59 PM, horschi wrote: > Hi Jean, > > we had the same issue, but on SizeTieredCompaction. During repair the > number of SSTables and pending compactions were exploding. > > It not

Re: Select values from map with multiple key values in where clause

2016-02-10 Thread DuyHai Doan
It's not possible have multiple keys in the CONTAINS KEY clause Right now it is not possible to use UDF in WHERE clause, it may eventually be possible one day But you can use UDF in the Select clause to filter out data. In this case, you'll need to wait for JIRA

reducing disk space consumption

2016-02-10 Thread Ted Yu
Hi, I am using DSE 4.8.4 On one node, disk space is low where: 42G /var/lib/cassandra/data/usertable/data-0abea7f0cf9211e5a355bf8dafbfa99c Using CLI, I dropped keyspace usertable but the data dir above still consumes 42G. What action would free this part of disk (I don't need the data) ?

Re: reducing disk space consumption

2016-02-10 Thread sai krishnam raju potturi
suggestion : try the following command "lsof | grep DEL". If in the output if you see a lot of SSTable files; restart the node. The disk space will be claimed back. thanks Sai On Wed, Feb 10, 2016 at 9:59 AM, Ted Yu wrote: > Hi, > I am using DSE 4.8.4 > On one node,

Re: 3k sstables during a repair incremental !!

2016-02-10 Thread Jean Carlo
Hello Kai This is for *cf3* nodetool cfstats pns_nonreg_bench.cf3 -H Keyspace: pns_nonreg_bench Read Count: 23594 Read Latency: 1.2980987539204882 ms. Write Count: 148161 Write Latency: 0.04608940274431193 ms. Pending Flushes: 0 Table: cf3 SSTable count: 11489

Re: Cassandra Collections performance issue

2016-02-10 Thread Benedict Elliott Smith
If the overwrites are per map key there are no tombstones generated; only if the whole map is re-imaged are tombstones created, and prior to 3.0 this indeed can be major problem if done frequently. Prior to 3.0 collections also forbid certain optimisations to cell comparisons, and as a result can

Do I have to use repair -inc with the option -par forcely?

2016-02-10 Thread Jean Carlo
Hi guys; The question is on the subject. I am testing repairs repairs -inc -par and I can see that in all my nodes the numbers of sstables explode to 5k from 5 sstables. I cannot permit this behaivor on my cluster in production. *It is anyway to run repairs incrementals but not -par ?* I know

Re: 3k sstables during a repair incremental !!

2016-02-10 Thread Kai Wang
Jean, What does your cfstats look like? Especially "SSTables in each level" line. On Wed, Feb 10, 2016 at 8:33 AM, Jean Carlo wrote: > Hello guys! > > I am testing the repair inc in my custer cassandra. I am doing my test > over these tables > > *CREATE TABLE

Debugging write timeouts on Cassandra 2.2.5

2016-02-10 Thread Mike Heffner
Hi all, We've recently embarked on a project to update our Cassandra infrastructure running on EC2. We are long time users of 2.0.x and are testing out a move to version 2.2.5 running on VPC with EBS. Our test setup is a 3 node, RF=3 cluster supporting a small write load (mirror of our staging

Re: 3k sstables during a repair incremental !!

2016-02-10 Thread Paulo Motta
Are you using vnodes by any chance? If so, how many? How many nodes and what's the replication factor? How was data inserted (at what consistency level)? Streaming might create a large number of sstables with vnodes (see CASSANDRA-10495), so in case data is inconsistent between nodes (detected

Re: distributing load across cluster

2016-02-10 Thread Ted Yu
I don't see tablestats sub-command: http://pastebin.com/XwwCAqh4 This is DSE 4.8.4 Cheers On Wed, Feb 10, 2016 at 12:05 PM, Jack Krupansky wrote: > What do your partition and cluster keys look like? > > Check a nodetool tablestats to see number of partition keys on

Re: distributing load across cluster

2016-02-10 Thread Jack Krupansky
What do your partition and cluster keys look like? Check a nodetool tablestats to see number of partition keys on the nodes. Also check nodetool tablehistograms to see if you have a lot of too-wide rows due to the balance of data between the partition key and clustering columns. -- Jack

Re: Debugging write timeouts on Cassandra 2.2.5

2016-02-10 Thread Paulo Motta
Are you using the same GC settings as the staging 2.0 cluster? If not, could you try using the default GC settings (CMS) and see if that changes anything? This is just a wild guess, but there were reports before of G1-caused instabilities with small heap sizes (< 16GB - see CASSANDRA-10403 for

Re: distributing load across cluster

2016-02-10 Thread Jack Krupansky
Sorry, I didn't realize you were still living in the stone age with DSE - and Cassandra 2.1. Chnage "table" to "cf" (column family.) -- Jack Krupansky On Wed, Feb 10, 2016 at 3:23 PM, Ted Yu wrote: > I don't see tablestats sub-command: > > http://pastebin.com/XwwCAqh4 > >

Re: distributing load across cluster

2016-02-10 Thread Ted Yu
Here is output from cfstats: http://pastebin.com/W4FVd4RW The keyspace was created as described in https://github.com/cloudius-systems/osv/wiki/Benchmarking-Cassandra-and-other-NoSQL-databases-with-YCSB Data was loaded by using ycsb. Cheers On Wed, Feb 10, 2016 at 12:26 PM, Jack Krupansky

Re: Schema Versioning

2016-02-10 Thread Alex Popescu
On Wed, Feb 10, 2016 at 12:05 PM, Joe Bako wrote: > Modern RDBMS tools can compare schemas between DDL object definitions and > live databases and generate change scripts accordingly. Older techniques > included maintaining a version and script table in the database,

Re: Schema Versioning

2016-02-10 Thread Jonathan Haddad
I wrote most of the cqlengine keyspace & table management pieces of the Python driver to solve this exact problem. Instead of working with a series of statements for creating tables & managing columns, we simply created classes in Python and sync'ed them to the DB. It automatically figured out

distributing load across cluster

2016-02-10 Thread Ted Yu
Hi, I am following this guide on a 5 node cluster: https://github.com/cloudius-systems/osv/wiki/Benchmarking-Cassandra-and-other-NoSQL-databases-with-YCSB I am using ycsb-0.5.0 I found that some node receives above average writes, leading to disk full condition. I want to get some suggestion on

Re: distributing load across cluster

2016-02-10 Thread Jack Krupansky
That's for one node. You can look at the writes for each node. I'm actually not sure if the partition key count includes memtables in addition to sstables. A nodetool flush will assure that any memtable data gets flushed to sstables. -- Jack Krupansky On Wed, Feb 10, 2016 at 3:30 PM, Ted Yu

Re: Materialized views and composite partition keys

2016-02-10 Thread Abdul Jabbar Azam
Hello, I've just changed my materialized view to have one partition key. The view gets generated now. After some refactoring I found that I didn't need a composite primary key at all. However if I later need one then I'll use a UDT. If it works... On Wed, 10 Feb 2016 at 13:04 DuyHai Doan

Schema Versioning

2016-02-10 Thread Joe Bako
Hi all, I am curious what techniques are used by others to manage schema changes over time in Cassandra. Modern RDBMS tools can compare schemas between DDL object definitions and live databases and generate change scripts accordingly. Older techniques included maintaining a version and

Re: Debugging write timeouts on Cassandra 2.2.5

2016-02-10 Thread Mike Heffner
Paulo, Thanks for the suggestion, we ran some tests against CMS and saw the same timeouts. On that note though, we are going to try doubling the instance sizes and testing with double the heap (even though current usage is low). Mike On Wed, Feb 10, 2016 at 3:40 PM, Paulo Motta

Re: Debugging write timeouts on Cassandra 2.2.5

2016-02-10 Thread Jeff Jirsa
What disk size are you using? From: Mike Heffner Reply-To: "user@cassandra.apache.org" Date: Wednesday, February 10, 2016 at 2:24 PM To: "user@cassandra.apache.org" Cc: Peter Norton Subject: Re: Debugging write timeouts on Cassandra 2.2.5 Paulo, Thanks for the suggestion, we ran some

Re: 3k sstables during a repair incremental !!

2016-02-10 Thread Marcus Eriksson
The reason for this is probably https://issues.apache.org/jira/browse/CASSANDRA-10831 (which only affects 2.1) So, if you had problems with incremental repair and LCS before, upgrade to 2.1.13 and try again /Marcus On Wed, Feb 10, 2016 at 2:59 PM, horschi wrote: > Hi Jean,

Re: best ORM for cassandra

2016-02-10 Thread Nirmallya Mukherjee
I have heard of that but I like to reduce layers & components in my architecture. If my DAO can directly use the C* driver then I believe I am better of. I am sure you already know there are many benefits of the driver - auto discovery, seamless failover, retry, various balancing policies etc

Materialized views and composite partition keys

2016-02-10 Thread Abdul Jabbar Azam
Hello, I tried creating a material view using a composite partition key but I got an error. I can't remember the error but it was complaining about the presence of the second field in the partition key. Has anybody experienced this or have a workaround. I haven't tried UDT's yet. -- Regards

Re: [RELEASE] Apache Cassandra 3.3 released

2016-02-10 Thread Will Zhang
Great points. Thank you guys. Sent from my iPhone > On 10 Feb 2016, at 03:39, Jonathan Haddad wrote: > > Adding to Jake's point - it's a noop if you run upgrade sstables and it > doesn't need to be upgraded. So just do it and save yourself a headache. > >> On Tue, Feb

Re: best ORM for cassandra

2016-02-10 Thread Karthik Prasad Manchala
Hi Nirmallya, You can try using Kundera (https://github.com/impetus-opensource/Kundera), a JPA 2.1 compliant Object-Datastore Mapping Library for major NoSql datastores. It also supports Polyglot persistence out-of-the-box. Quick start ->

Re: Materialized views and composite partition keys

2016-02-10 Thread Abdul Jabbar Azam
Ah. I think that's where I'm going wrong. I'll have a look when I get home. On Wed, 10 Feb 2016 at 13:04 DuyHai Doan wrote: > You can't have more than 1 non-pk column from the base table as primary > key column of the view. All is explained here: >

Re: Debugging write timeouts on Cassandra 2.2.5

2016-02-10 Thread Mike Heffner
Jeff, We have both commitlog and data on a 4TB EBS with 10k IOPS. Mike On Wed, Feb 10, 2016 at 5:28 PM, Jeff Jirsa wrote: > What disk size are you using? > > > > From: Mike Heffner > Reply-To: "user@cassandra.apache.org" > Date: Wednesday, February 10, 2016 at 2:24

RE: reducing disk space consumption

2016-02-10 Thread Mohammed Guller
If I remember it correctly, C* creates a snapshot when you drop a keyspace. Run the following command to get rid of the snapshot: nodetool clearsnapshot Mohammed Author: Big Data Analytics with Spark From: Ted Yu