repairs how do we schedule

2016-03-10 Thread Anishek Agarwal
Hello, we used to run repair on each node using https://github.com/BrianGallew/cassandra_range_repair.git. most of the time repairs finished in under 12 hrs per node, we had then 4 nodes. gradually the repair time kept increasing as traffic increased, we also added more nodes meanwhile, we have 7

Re: What is wrong in this token function

2016-03-10 Thread Rakesh Kumar
thanks. that explains it. -Original Message- From: Jack Krupansky To: user Sent: Thu, Mar 10, 2016 5:28 pm Subject: Re: What is wrong in this token function >From the doc: "When using the RandomPartitioner or Murmur3Partitioner,

Re: What is wrong in this token function

2016-03-10 Thread Jack Krupansky
>From the doc: "When using the RandomPartitioner or Murmur3Partitioner, Cassandra rows are ordered by the hash of their value and hence the order of rows is not meaningful... The ByteOrdered partitioner arranges tokens the same way as key values, but the RandomPartitioner and Murmur3Partitioner

Re: Exception about too long clustering key

2016-03-10 Thread Jack Krupansky
The offending code (limit check and exception with that message) went away with the merge of the so-called "Storage engine refactor, a.k.a CASSANDRA-8099" changes from September 1, 2014 that got merged on June 30, 2015. Github doesn't seem to have a way to search old versions of the repo.

Re: Using User Defined Functions in UPDATE queries

2016-03-10 Thread Kim Liu
It does sounds like the use of UDF in UPDATE is in an ambiguous state at the moment, then. The document grammar says they can’t be used, but the document examples say they can, and the server will execute them, but it can’t execute them in a useful way (i.e. no row supplied data.) So

Re: What is wrong in this token function

2016-03-10 Thread Rakesh Kumar
I am using default Murmur3. So are you saying in case of Murmur3 the following two queries select count*) where customer_id = '289' and event_time >= '2016-03-01 18:45:00+' and event_time <= '2016-03-12 19:05:00+' ; and select count(*) where token(customer_id,event_time) >=

Re: Using User Defined Functions in UPDATE queries

2016-03-10 Thread DuyHai Doan
Surely an error because the grammar definition for UPDATE does not mention any function call: ::= UPDATE ( USING ( AND )* )? SET ( ',' )* WHERE ( IF ( AND condition )* )? ::= '=' | '=' ('+' | '-') ( | | ) | '=' '+' | '[' ']' '=' ::= '=' | '[' ']' '=' ::= ( AND )*

Re: What is wrong in this token function

2016-03-10 Thread Jack Krupansky
What partitioner are you using? The default partitioner is not "ordered", so it will randomly order the hashes/tokens, so that tokens will not be ordered even if your PKs are ordered. You probably want to use customer as your partition key and event time as a clustering column - then you can use

Re: What is wrong in this token function

2016-03-10 Thread Rakesh Kumar
typo: the primary key was (customer_id + event_time ) -Original Message- From: Rakesh Kumar To: user Sent: Thu, Mar 10, 2016 4:44 pm Subject: What is wrong in this token function C* 3.0.3 I have a table table1 which has the primary

What is wrong in this token function

2016-03-10 Thread Rakesh Kumar
C* 3.0.3 I have a table table1 which has the primary key on ((customer_id,event_id)). I loaded 1.03 million rows from a csv file. Business case: Show me all events for a given customer in a given time frame In RDBMS it will be (Query1) where customer_id = '289' and event_time >=

Re: Using User Defined Functions in UPDATE queries

2016-03-10 Thread Kim Liu
Um, I’m not entirely sure how I misread it, since this was copy-pasted from the document: UPDATE atable SET col = some_function(?) …; So the document examples certainly seem to support the use of UDF in UPDATE. I suppose the document may be more erroneous in its writing than I in its

Re: Using User Defined Functions in UPDATE queries

2016-03-10 Thread DuyHai Doan
You have misread the CQL doc given in the link. According to CQL update grammar it's not possible to use UDF. I see UDF only allowed in select clause... Le 10 mars 2016 22:07, "Kim Liu" a écrit : > Hello - > I am experimenting with User Defined Functions in Cassandra

Using User Defined Functions in UPDATE queries

2016-03-10 Thread Kim Liu
Hello - I am experimenting with User Defined Functions in Cassandra (3.3) and I am a bit puzzled by a problem I am having when testing them with cqlsh. I have tried to find the answers online, but have not had any luck so far. According to http://cassandra.apache.org/doc/cql3/CQL.html it looks

Re: How to measure the write amplification of C*?

2016-03-10 Thread Jack Krupansky
The doc does say this: "A log-structured engine that avoids overwrites and uses sequential IO to update data is essential for writing to solid-state disks (SSD) and hard disks (HDD) On HDD, writing randomly involves a higher number of seek operations than sequential writing. The seek penalty

Re: Unexplainably large reported partition sizes

2016-03-10 Thread Tom van den Berge
Thanks guys. I've upgraded to 2.2.5, and the problem is gone. Tom On Wed, Mar 9, 2016 at 10:47 PM, Robert Coli wrote: > On Mon, Mar 7, 2016 at 1:25 PM, Nate McCall > wrote: > >> >>> Rob, can you remember which bug/jira this was? I have not been

Re: Exception about too long clustering key

2016-03-10 Thread Emīls Šolmanis
Jack Yeah, I tracked it down to https://github.com/apache/cassandra/blob/cassandra-2.2.4/src/java/org/apache/cassandra/cql3/QueryProcessor.java#L213 But it's actually to do with how the cell names are being constructed for collections somehow. The offender was a set, but I solved it by

Re: Exception about too long clustering key

2016-03-10 Thread Jack Krupansky
Did you ever find the source of the message? I couldn't find it in github either, either in the driver or Cassandra proper. -- Jack Krupansky On Thu, Mar 10, 2016 at 12:39 PM, Emīls Šolmanis wrote: > In case someone stumbles upon this same thing later. > > Ended up

Re: ntpd clock sync

2016-03-10 Thread Robert Coli
On Wed, Mar 9, 2016 at 9:03 AM, K F wrote: > the clock is about 30 to 40 seconds behind. > If you don't want to get ntp working there, why not just... manually... set the clocks? =Rob

Re: How can I make Cassandra stable in a 2GB RAM node environment ?

2016-03-10 Thread Robert Coli
On Thu, Mar 10, 2016 at 3:27 AM, Alain RODRIGUEZ wrote: > So, like Jack, I globally really not recommend it unless you know what you > are doing and don't care about facing those issues. > Certainly a spectrum of views here, but everyone (including OP) seems to agree with

Re: How to measure the write amplification of C*?

2016-03-10 Thread Sebastian Estevez
https://issues.apache.org/jira/browse/CASSANDRA-10805 All the best, [image: datastax_logo.png] Sebastián Estévez Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com [image: linkedin.png] [image:

Re: How to measure the write amplification of C*?

2016-03-10 Thread Jeff Ferland
Compaction logs show the number of bytes written and the level written to. Base write load = table flushed to L0. Write amplification = sum of all compactions written to disk for the table. On Thu, Mar 10, 2016 at 9:44 AM, Dikang Gu wrote: > Hi Matt, > > Thanks for the

Re: How to measure the write amplification of C*?

2016-03-10 Thread Jeff Jirsa
A bit of Splunk-fu probably works for this – you’ll have different line entries for memtable flushes vs compaction output. Comparing the two will give you a general idea of compaction amplification. From: Dikang Gu Reply-To: "user@cassandra.apache.org" Date: Thursday, March 10, 2016 at

Re: How to measure the write amplification of C*?

2016-03-10 Thread Matt Kennedy
It isn't really the data written by the host that you're concerned with, it's the data written by your application. I'd start by instrumenting your application tier to tally up the size of the values that it writes to C*. However, it may not be extremely useful to have this value. You can't do

Re: Strategy for dividing wide rows beyond just adding to the partition key

2016-03-10 Thread Jonathan Haddad
Oops sorry, you wrote below that the shard is what I was suggesting. I didn't fully understand the problem you had. I'll think about it a little bit and come up w/ something. On Thu, Mar 10, 2016 at 9:47 AM Jonathan Haddad wrote: > My advice was to use the date that the

Re: Strategy for dividing wide rows beyond just adding to the partition key

2016-03-10 Thread Jonathan Haddad
My advice was to use the date that the reading was recorded as part of the Partition key instead of some arbitrary shard id. Then you don't have to look anything up in a different table. create table sensorReadings ( sensorUnitId int, sensorId int, date_recorded date, time timestamp, timeShard

Re: How to measure the write amplification of C*?

2016-03-10 Thread Dikang Gu
Hi Matt, Thanks for the detailed explanation! Yes, this is exactly what I'm looking for, "write amplification = data written to flash/data written by the host". We are heavily using the LCS in production, so I'd like to figure out the amplification caused by that and see what we can do to

Re: Strategy for dividing wide rows beyond just adding to the partition key

2016-03-10 Thread Jason Kania
Jack, Thanks for the response. I don't think I provided enough information and used the wrong terminology as your response is more the canned advice is response to Cassandra antipatterns. To make this clearer, this is what we are doing: create table sensorReadings (sensorUnitId int, sensorId

Re: How to measure the write amplification of C*?

2016-03-10 Thread Matt Kennedy
After posting this, Jon Haddad pinged me on chat and said (I'm paraphrasing): Actually, this company I work with a lot burns through SSDs so fast it's absurd, their write amp is gigantic. This is a very good point, however it isn't what I would call typical, and a lot is going to depend on the

Re: Exception about too long clustering key

2016-03-10 Thread Emīls Šolmanis
In case someone stumbles upon this same thing later. Ended up being a collection item that was too big (i.e., larger than 64K). Something to do with the way Cassandra generates the keys for collections, but moving the offending collection from a list to a separate clustering key solved this

Re: Strategy for dividing wide rows beyond just adding to the partition key

2016-03-10 Thread Jason Kania
Hi Jonathan, Thanks for the response. To make this clearer, this is what we are doing: create table sensorReadings (sensorUnitId int, sensorId int,time timestamp,timeShard int, readings blob,primary key((sensorUnitId, sensorId, timeShard), time); where timeShard is a combination of year and week

Re: How to measure the write amplification of C*?

2016-03-10 Thread Matt Kennedy
TL;DR - Cassandra actually causes a ton of write amplification but it doesn't freaking matter any more. Read on for details... That slide deck does have a lot of very good information on it, but unfortunately I think it has led to a fundamental misunderstanding about Cassandra and write

Re: Strategy for dividing wide rows beyond just adding to the partition key

2016-03-10 Thread Jonathan Haddad
Have you considered making the date (or week, or whatever, some time component) part of your partition key? something like: create table sensordata ( sensor_id int, day date, ts datetime, reading int, primary key((sensor_id, day), ts); Then if you know you need data by a particular date range,

Re: Strategy for dividing wide rows beyond just adding to the partition key

2016-03-10 Thread Jack Krupansky
There is an effort underway to support wider rows: https://issues.apache.org/jira/browse/CASSANDRA-9754 This won't help you now though. Even with that improvement you still may need a more optimal data model since large-scale scanning/filtering is always a very bad idea with Cassandra. The data

Strategy for dividing wide rows beyond just adding to the partition key

2016-03-10 Thread Jason Kania
Hi, We have sensor input that creates very wide rows and operations on these rows have started to timeout regulary. We have been trying to find a solution to dividing wide rows but keep hitting limitations that move the problem around instead of solving it. We have a partition key consisting of

Re: How to measure the write amplification of C*?

2016-03-10 Thread Paulo Motta
This is a good source on Cassandra + write amplification: http://www.slideshare.net/rbranson/cassandra-and-solid-state-drives 2016-03-10 9:57 GMT-03:00 Benjamin Lerer : > Cassandra should not cause any write amplification. Write amplification > appends only when you

Re: How to measure the write amplification of C*?

2016-03-10 Thread Alain RODRIGUEZ
Hi Dikang, I am not sure about what you call "amplification", but as sizes highly depends on the structure I think I would probably give it a try using CCM ( https://github.com/pcmanus/ccm) or some test cluster with 'production like' setting and schema. You can write a row, flush it and see how

Re: How can I make Cassandra stable in a 2GB RAM node environment ?

2016-03-10 Thread Alain RODRIGUEZ
+1 for Rob comment. I would add that I have been learning a lot from running t1.micro (then small, medium, Large, ..., i2.2XL) on AWS machines (800 MB RAM). I had to tweak every single parameter in cassandra.yaml and cassandra-env.sh. So I leaned a lot about internals, I had to! Even if I am glad

RE: [C*2.1]memtable_allocation_type: offheap_objects

2016-03-10 Thread aeljami.ext
Thank you ! Another question please : In the blog post (http://www.datastax.com/dev/blog/off-heap-memtables-in-Cassandra-2-1), it is noted that offheap_objects it effective for small values like ints or uuids as well. offheap_buffers for the tables with string or blobs. in my case, the tables

Re: moving keyspaces to another disk while Cassandra is running

2016-03-10 Thread Alain RODRIGUEZ
FWIW: http://thelastpickle.com/blog/2016/02/25/removing-a-disk-mapping-from-cassandra.html I know it is not exactly what you want, but I believe it might be useful. C*heers, --- Alain Rodriguez - al...@thelastpickle.com France The Last Pickle - Apache Cassandra Consulting

Re: Removing Node causes bunch of HostUnavailableException

2016-03-10 Thread Alain RODRIGUEZ
Hi Praveen, how is this going ? I have been out for a while, did you manage to remove the nodes ? Do you need more help ? If so, I could use a status update and more information about the remaining issues. C*heers, --- Alain Rodriguez - al...@thelastpickle.com France The

Re: Cassandra-stress output

2016-03-10 Thread Jean Carlo
However can it be nice to have the posibility to configurate that with cassandra options, or when using a file yaml to insert data on any table. Saludos Jean Carlo "The best way to predict the future is to invent it" Alan Kay On Thu, Mar 10, 2016 at 10:48 AM, Jean Carlo

Re: Cassandra-stress output

2016-03-10 Thread Jean Carlo
Thank you very much S. Alborghetti I will consider that suggestion. Saludos Jean Carlo "The best way to predict the future is to invent it" Alan Kay On Thu, Mar 10, 2016 at 5:47 AM, Stefania Alborghetti < stefania.alborghe...@datastax.com> wrote: > On Tue, Mar 8, 2016 at 8:39 PM, Jean Carlo