Re: What does the rate signify for latency in the JMX Metrics?

2014-05-16 Thread Chris Burroughs
They are exponential decaying moving averages (like Unix load averages) of the number of events per unit of time. http://wiki.apache.org/cassandra/Metrics might help On 04/17/2014 06:06 PM, Redmumba wrote: Good afternoon, I'm attempting to integrate the metrics generated via JMX into our

Re: What does the rate signify for latency in the JMX Metrics?

2014-05-16 Thread Chris Lohfink
What does the rate signify in this context? For example, given the OneMinuteRate of 675.7673129014964 and the unit of seconds--what is this measuring? means that there were 675 write requests per second over the last one minute. As Other Chris (tm) mentioned this is exp decaying

Re: clearing tombstones?

2014-05-16 Thread Ruchir Jha
I tried to do this, however the doubling in disk space is not temporary as you state in your note. What am I missing? On Fri, Apr 11, 2014 at 10:44 AM, William Oberman ober...@civicscience.comwrote: So, if I was impatient and just wanted to make this happen now, I could: 1.) Change

Multi-dc cassandra keyspace

2014-05-16 Thread Anand Somani
Hi, It seems like it should be possible to have a keyspace replicated only to a subset of DC's on a given cluster spanning across multiple DCs? Is there anything bad about this approach? Scenario Cluster spanning 4 DC's = CA, TX, NY, UT Has multiple keyspaces such that * keyspace_CA_TX -

Erase old sstables to make room for new sstables

2014-05-16 Thread Redmumba
In the system we're using, we have a large fleet of servers constantly appending time-based data to our database--it's largely writes, very few reads (it's auditing data). However, our cluster max space is around 80TB, and we'd like to maximize how much data we can retain. One option is to

Tombstones

2014-05-16 Thread Dimetrio
Does cassandra delete tombstones during simple LCS compaction or I should use node tool repair? Thanks. -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Tombstones-tp7594467.html Sent from the cassandra-u...@incubator.apache.org mailing list

Re: Disable reads during node rebuild

2014-05-16 Thread Paulo Ricardo Motta Gomes
That'll be really useful, thanks!! On Wed, May 14, 2014 at 7:47 PM, Aaron Morton aa...@thelastpickle.comwrote: As of 2.0.7, driftx has added this long-requested feature. Thanks A - Aaron Morton New Zealand @aaronmorton Co-Founder Principal Consultant Apache

Re: Efficient bulk range deletions without compactions by dropping SSTables.

2014-05-16 Thread graham sanderson
Just a few data points from our experience One of our use cases involves storing a periodic full base state for millions of records, then fairly frequent delta updates to subsets of the records in between. C* is great for this because we can read the whole row (or up to the clustering

Re: Mutation messages dropped

2014-05-16 Thread Paulo Ricardo Motta Gomes
It means asynchronous write mutations were dropped, but if the writes are completing without TimedOutException, then at least ConsistencyLevel replicas were correctly written. The remaining replicas will eventually be fixed by hinted handoff, anti-entropy (repair) or read repair. More info:

Re: What % of cassandra developers are employed by Datastax?

2014-05-16 Thread Peter Lin
perhaps the committers should invite other developers that have shown an interest in contributing to Cassandra. the rate of adding new non-Datastax committers appears to be low the last 2 years. I have no data to support it, it's just a feeling based personal observations the last 3 years.

Re: Query first 1 columns for each partitioning keys in CQL?

2014-05-16 Thread Jonathan Lacefield
Hello, Have you looked at using the CLUSTERING ORDER BY and LIMIT features of CQL3? These may help you achieve your goals. http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/refClstrOrdr.html http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/select_r.html

Re: What % of cassandra developers are employed by Datastax?

2014-05-16 Thread Jeremy Hanna
Of the 16 active committers, 8 are not at DataStax. See http://wiki.apache.org/cassandra/Committers. That said, active involvement varies and there are other contributors inside DataStax and in the community. You can look at the dev mailing list as well to look for involvement in more

RE: NTS, vnodes and 0% chance of data loss

2014-05-16 Thread Mark Farnan
Why not use NetworkTopology and specify each region as a ‘DC’ ? Setup a snitch (propertyFile or Gossip, or even the EC2Region one) to list out which nodes are in which DC. Then when creating the Keyspace, specify NetworkTopology, with RF1 in each DC / Rack. Ie. CREATE KEYSPACE

ANN Cassaforte 1.3.0 is released

2014-05-16 Thread Michael Klishin
Cassaforte [1] is a Clojure client for Cassandra built around CQL and focusing on ease of use. Release notes: http://blog.clojurewerkz.org/blog/2014/05/15/cassaforte-1-dot-3-0-is-released/ 1. http://clojurecassandra.info -- MK http://github.com/michaelklishin http://twitter.com/michaelklishin

Re: What % of cassandra developers are employed by Datastax?

2014-05-16 Thread Michael Shuler
On 05/14/2014 03:39 PM, Kevin Burton wrote: I'm curious what % of cassandra developers are employed by Datastax? http://wiki.apache.org/cassandra/Committers -- Kind regards, Michael

Re: What % of cassandra developers are employed by Datastax?

2014-05-16 Thread Janne Jalkanen
Don’t know, but as a potential customer of DataStax I’m also concerned at the fact that there does not seem to be a competitor offering Cassandra support and services. All innovation seems to be occurring only in the OSS version or DSE(*). I’d welcome a competitor for DSE - it does not even

Re: Mutation messages dropped

2014-05-16 Thread Chris Lohfink
Shameless plug: http://www.evidencebasedit.com/guide-to-cassandra-thread-pools/#droppable On May 15, 2014, at 7:37 PM, Mark Reddy mark.re...@boxever.com wrote: Yes, please see http://wiki.apache.org/cassandra/FAQ#dropped_messages for further details. Mark On Fri, May 9, 2014 at

Migrate a model from 0.6

2014-05-16 Thread cbert...@libero.it
Hi all, more than a years ago I wrote a comment for migrating an old schema to a new model. Since the company had other priorities we didn't realize, and now I'm trying to upgrade my 0.6 data-model to the newest 2.0 model. The DB contains mainly comments written by users on companies. Comments

Re: Cassandra MapReduce/Storm/ etc

2014-05-16 Thread Jack Krupansky
Here’s a meetup talk on analytics using Cassandra, Storm, and Kafka: http://www.slideshare.net/aih1013/building-largescale-analytics-platform-with-storm-kafka-and-cassandra-nyc-storm-user-group-meetup-21st-nov-2013 -- Jack Krupansky From: Manoj Khangaonkar Sent: Thursday, May 8, 2014 5:43 PM

conditional delete consistency level/timeout

2014-05-16 Thread Mohica Jasha
Earlier I reported the following bug against C* 2.0.5 https://issues.apache.org/jira/browse/CASSANDRA-7176 It seems to be fixed in C* 2.0.7, but we are still seeing similar suspicious timeouts. We have a cluster of C* 2.0.7, DC1:3, DC2:3 We have the following table: CREATE TABLE

Re: Cassandra 2.0.7 always failes due to 'too may open files' error

2014-05-16 Thread Yatong Zhang
Yes the global limits are OK. I added cassandra to '/etc/rc.local' to make it auto-startup, but seems the modification of limits didn't take effect. I observed this as Bryan suggested, so I added ulimit -SHn 99 to '/etc/rc.local' and before cassandra start command, and it worked. On Thu,

Re: What does the rate signify for latency in the JMX Metrics?

2014-05-16 Thread Redmumba
Unfortunately, I found the documentation to be very lackluster. However, I have actually begun to use the Yammer Metrics library in other projects, so I have a much better understanding of what it generates. Thank you for the response! (also, for some strange reason, I am just getting the email

Re: Tombstones

2014-05-16 Thread Arya Goudarzi
Nodetool cleanup deletes rows that aren't owned by specific tokens (shouldn't be on this node). And nodetool repair makes sure data is in sync between all replicas. It is wrong to say either of these commands cleanup tombstones. Tombstones are only cleaned up during compactions only if they are

Re: What % of cassandra developers are employed by Datastax?

2014-05-16 Thread Kevin Burton
Perhaps because the developers are working on DSE :-P On Fri, May 16, 2014 at 8:13 AM, Jeremy Hanna jeremy.hanna1...@gmail.comwrote: Of the 16 active committers, 8 are not at DataStax. See http://wiki.apache.org/cassandra/Committers. That said, active involvement varies and there are

Re: Tombstones

2014-05-16 Thread Omar Shibli
Yes, but still you need to run 'nodetool cleanup' from time to time to make sure all tombstones are deleted. On Fri, May 16, 2014 at 10:11 AM, Dimetrio dimet...@flysoft.ru wrote: Does cassandra delete tombstones during simple LCS compaction or I should use node tool repair? Thanks. --

Re: Can Cassandra client programs use hostnames instead of IPs?

2014-05-16 Thread Huiliang Zhang
Thanks. My case is that there is no public ip and VPN cannot be set up. It seems that I have to run EMR job to operate on the AWS cassandra cluster. I got some timeout errors during running the EMR job as: java.lang.RuntimeException: Could not retrieve endpoint ranges: at

Re: Really need some advices on large data considerations

2014-05-16 Thread Yatong Zhang
Hi Michael, thanks for the reply, I would RAID0 all those data drives, personally, and give up managing them separately. They are on multiple PCIe controllers, one drive per channel, right? Raid 0 is a simple way to go but one disk failure can cause the whole volume down, so I am afraid raid

Data modeling for Pinterest-like application

2014-05-16 Thread ziju feng
Hello, I'm working on data modeling for a Pinterest-like project. There are basically two main concepts: Pin and Board, just like Pinterest, where pin is an item containing an image, description and some other information such as a like count, and each board should contain a sorted list of Pins.

Re: Storing log structured data in Cassandra without compactions for performance boost.

2014-05-16 Thread Ben Bromhead
If you make the timestamp the partition key you won't be able to do range queries (unless you use an ordered partitioner). Assuming you are logging from multiple devices you will want your partition key to be the device id the date, your clustering key to be the timestamp (timeuuid are good

Re: How long are expired values actually returned?

2014-05-16 Thread Sebastian Schmidt
Thank you for your answer, I really appreciate that you want to help me. But already found out that I did something wrong in my implementation. Am 13.05.2014 02:53, schrieb Chris Lohfink: That is not expected. What client are you using and how are you setting the ttls? What version of

Re: Cassandra token range support for Hadoop (ColumnFamilyInputFormat)

2014-05-16 Thread Paulo Ricardo Motta Gomes
Hello Anton, What version of Cassandra are you using? If between 1.2.6 and 2.0.6 the setInputRange(startToken, endToken) is not working. This was fixed in 2.0.7: https://issues.apache.org/jira/browse/CASSANDRA-6436 If you can't upgrade you can copy AbstractCFIF and CFIF to your project and

Tombstones on secondary indexes

2014-05-16 Thread Joel Samuelsson
My system log is full of messages like this one: WARN [ReadStage:42] 2014-05-15 08:19:13,615 SliceQueryFilter.java (line 210) Read 0 live and 2829 tombstoned cells in TrafficServer.rawData.rawData_evaluated_idx (see tombstone_warn_threshold) I've run a major compaction but the tombstones are not

Storing globally sorted data

2014-05-16 Thread Kevin Burton
Let's say I have an external job (MR, pig, etc) sorting a cassandra table by some complicated mechanism. We want to store the sorted records BACK into cassandra so that clients can read the records sorted. What I was just thinking of doing was storing the records as pages. So page 0 would have

Re: Really need some advices on large data considerations

2014-05-16 Thread DuyHai Doan
You can watch this: https://www.youtube.com/watch?v=uoggWahmWYI Aaron is discussing about support for big nodes On Wed, May 14, 2014 at 3:13 AM, Yatong Zhang bluefl...@gmail.com wrote: Thank you Aaron, but we're planning about 20T per node, is that feasible? On Mon, May 12, 2014 at 4:33

Re: How does cassandra page through low cardinality indexes?

2014-05-16 Thread DuyHai Doan
Hello Kevin For the internal working of secondary index and LIMIT, you can have a look at this : https://issues.apache.org/jira/browse/CASSANDRA-5975 The comments and attached patch will give you a hint on how LIMIT is implemented. Alternatively you can look directly in the source code

Re: Best partition type for Cassandra with JBOD

2014-05-16 Thread Kevin Burton
That and nobarrier… and probably noop for the scheduler if using SSD and setting readahead to zero... On Fri, May 16, 2014 at 10:29 AM, James Campbell ja...@breachintelligence.com wrote: Hi all— What partition type is best/most commonly used for a multi-disk JBOD setup running Cassandra

ownership not equally distributed

2014-05-16 Thread Rameez Thonnakkal
Hello I am having a 4 node cluster where 2 nodes are in one data center and another 2 in a different one. But in the first data center the token ownership is not equally distributed. I am using vnode feature. num_tokens is set to 256 in all nodes. initial_number is left blank. Datacenter: DC1

Re: Best partition type for Cassandra with JBOD

2014-05-16 Thread Ariel Weisberg
Hi, Recommending nobarrier (mount option barrier=0) when you don't know if a non-volatile cache in play is probably not the way to go. A non-volatile cache will typically ignore write barriers if a given block device is configured to cache writes anyways. I am also skeptical you will see a

Best partition type for Cassandra with JBOD

2014-05-16 Thread James Campbell
Hi all- What partition type is best/most commonly used for a multi-disk JBOD setup running Cassandra on CentOS 64bit? The datastax production server guidelines recommend XFS for data partitions, saying, Because Cassandra can use almost half your disk space for a single file, use XFS when

Questions on Leveled Compaction sizing and compaction corner cases

2014-05-16 Thread DuyHai Doan
I was reading this http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra and need some confirmation: A Sizing *Each level is ten times as large as the previous* In the comments: At October 14, 2011 at 12:33

Re: Storing globally sorted data

2014-05-16 Thread DuyHai Doan
What you show is basically the idea of bucketing data. One bucket = one physical partition. Within each bucket, there is a fixed number of column (1000 in your example). This strategy works fine and avoid too large partition. The only draw back I would see is the need to fetch data over buckets

Re: Data modeling for Pinterest-like application

2014-05-16 Thread DuyHai Doan
The problem is whether I should denormalize details of pins into the board table or just retrieve pins by page (page size can be 10~20) and then multi-get by pin_ids to obtain details -- Denormalize is the best way to go in your case. Otherwise, for 1 board read, you'll have 10-20 subsequent

Re: Backup procedure

2014-05-16 Thread Chris Burroughs
It's also good to note that only the Data files are compressed already. Depending on your data the Index and other files may be a significant percent of total on disk data. On 05/02/2014 01:14 PM, tommaso barbugli wrote: In my tests compressing with lzop sstables (with cassandra compression

Number of rows under one partition key

2014-05-16 Thread Vegard Berget
Hi, I know this has been discussed before, and I know there are limitations to how many rows one partition key in practice can handle.  But I am not sure if number of rows or total data is the deciding factor.  I know the thrift interface well, but this is my first project where we are actively

Re: Couter column family performance problems

2014-05-16 Thread Robert Coli
On Mon, May 12, 2014 at 3:03 PM, Batranut Bogdan batra...@yahoo.com wrote: I have a counter CF defined as pk text PRIMARY KEY, a counter, b counter, c counter, d counter Feel free to comment and share experiences about counter CF performance. Briefly : 1) Counters original version are

Re: Tombstones

2014-05-16 Thread Keith Wright
Note that Cassandra will not compact away some tombstones if you have differing column TTLs. See the following jira and resolution I filed for this: https://issues.apache.org/jira/browse/CASSANDRA-6654 On May 16, 2014 4:49 PM, Chris Lohfink clohf...@blackbirdit.com wrote: It will delete them

Re: Failed to mkdirs $HOME/.cassandra

2014-05-16 Thread Dave Brosius
For now you can edit the nodetool script itself by adding -Duser.home=/tmp as in $JAVA $JAVA_AGENT -cp $CLASSPATH -Xmx32m -Duser.home=/tmp -Dlogback.configurationFile=logback-tools.xml -Dstorage-config=$CASSANDRA_CONF org.apache.cassandra.tools.NodeTool -p $JMX_PORT $ARGS if

Re: What % of cassandra developers are employed by Datastax?

2014-05-16 Thread Jack Krupansky
You can always check the project committer wiki: http://wiki.apache.org/cassandra/Committers -- Jack Krupansky From: Kevin Burton Sent: Wednesday, May 14, 2014 4:39 PM To: user@cassandra.apache.org Subject: What % of cassandra developers are employed by Datastax? I'm curious what % of

Query first 1 columns for each partitioning keys in CQL?

2014-05-16 Thread Matope Ono
Hi, I'm modeling some queries in CQL3. I'd like to query first 1 columns for each partitioning keys in CQL3. For example: create table posts( author ascii, created_at timeuuid, entry text, primary key(author,created_at) ); insert into posts(author,created_at,entry) values

Re: Cassandra token range support for Hadoop (ColumnFamilyInputFormat)

2014-05-16 Thread Clint Kelly
Hi Anton, One approach you could look at is to write a custom InputFormat that allows you to limit the token range of rows that you fetch (if the AbstractColumnFamilyInputFormat does not do what you want). Doing so is not too much work. If you look at the class RowIterator within

How does cassandra page through low cardinality indexes?

2014-05-16 Thread Kevin Burton
I'm struggling with cassandra secondary indexes since the documentation seems all over the place and I'm having to put together everything from blog posts. Anyway. If I have a low cardinality index of say 10 values, and 1M records. This means each secondary index key will have references to

Re: Tombstones

2014-05-16 Thread Chris Lohfink
It will delete them after gc_grace_seconds (set per table) and a compaction. --- Chris Lohfink On May 16, 2014, at 9:11 AM, Dimetrio dimet...@flysoft.ru wrote: Does cassandra delete tombstones during simple LCS compaction or I should use node tool repair? Thanks. -- View this

Re: What % of cassandra developers are employed by Datastax?

2014-05-16 Thread Chris Lohfink
There does seem to be some effort trying to encourage others - DataStax had some talks explaining how to contribute. This year there is even a extra bootcamp http://learn.datastax.com/CassandraSummitBootcampApplication.html On May 16, 2014, at 9:47 AM, Peter Lin wool...@gmail.com wrote:

Re: Storing log structured data in Cassandra without compactions for performance boost.

2014-05-16 Thread Kevin Burton
If the data is read from a slice of a partition that has been added over time there will be a part of that row in every almost sstable. That would mean all of them (multiple disk seeks depending on clustering order per sstable) would have to be read from in order to service the query. Data

Re: What % of cassandra developers are employed by Datastax?

2014-05-16 Thread Colin
I used cassandra for years at NYSE and we were able to do what we wanted with cassandra by leveraging open source and internal development knowing that cassandra did what we wanted it to do and that no one could ever take the code away from us in a worst case scenario. Compare and contrast

Re: Data modeling for Pinterest-like application

2014-05-16 Thread ziju feng
Thanks for your answer, I really like the frequency of update vs read way of thinking. A related question is whether it is a good idea to denormalize on read-heavy part of data while normalize on other less frequently-accessed data? Our app will have a limited number of system managed boards

Re: Multi-dc cassandra keyspace

2014-05-16 Thread Tupshin Harper
It's often an excellent strategy. No known issues. -Tupshin On May 16, 2014 4:13 PM, Anand Somani meatfor...@gmail.com wrote: Hi, It seems like it should be possible to have a keyspace replicated only to a subset of DC's on a given cluster spanning across multiple DCs? Is there anything

Re: Mutation messages dropped

2014-05-16 Thread Mark Reddy
Yes, please see http://wiki.apache.org/cassandra/FAQ#dropped_messages for further details. Mark On Fri, May 9, 2014 at 12:52 PM, Raveendran, Varsha IN BLR STS varsha.raveend...@siemens.com wrote: Hello, I am writing around 10Million records continuously into a single node Cassandra

Index with same Name but different keyspace

2014-05-16 Thread mahesh rajamani
Hi, I am using Cassandra 2.0.5 version. I trying to setup 2 keyspace with same tables for different testing. While creating index on the tables, I realized I am not able to use the same index name though the tables are in different keyspaces. Is maintaining unique index name across keyspace is

Re: Couter column family performance problems

2014-05-16 Thread Chris Lohfink
What version are you using? and what consistency level are you using for your inserts? A CL.ONE for instance can end up with a large backup in the replicateOnWrite (or CounterMutation depending on version) stage since it happens outside the feedback loop from the request and can be a little

Running Production Cluster at Rackspace

2014-05-16 Thread Jan Algermissen
Hi, can anyone point me to recommendations for hosting and configuration requirements when running a Production Cassandra Cluster at Rackspace? Are there reference projects that document the suitability of Rackspace for running a production Cassandra cluster? Jan

RE: Cassandra token range support for Hadoop (ColumnFamilyInputFormat)

2014-05-16 Thread Anton Brazhnyk
Hi Paulo, I’m using C* 1.2.15 and have no easy option to upgrade (at least not to 2.0.* branch). I’ve started to look if I can implement my variant of InputFormat. Thanks a lot for the hint, I’m for sure will check how it’s done in 2.0.6 and if it’s possible to backport it to 1.2.* branch.

null date bug? Not sure if its cassandra 2.0.5 or the gocql (golang) driver.

2014-05-16 Thread Jacob Rhoden
Im noticing the following strange behaviour when I do a query on a table: cqlsh:mykeyspace select uuid, discontinued_from from mytable; uuid | discontinued_from --+--

Clustering order and secondary index

2014-05-16 Thread cbert...@libero.it
Hi all, I'm trying to migrate my old project born with Cassandra 0.6 and grown with 0.7 /1.0 to the latest 2.0. I have an easy question for you all: query using only secondary indexes do not respect any clustering order? Thanks

Re: Efficient bulk range deletions without compactions by dropping SSTables.

2014-05-16 Thread Paulo Ricardo Motta Gomes
Hello Kevin, In 2.0.X an SSTable is automatically dropped if it contains only tombstones: https://issues.apache.org/jira/browse/CASSANDRA-5228. However this will most likely happen if you use LCS. STCS will create sstables of larger size that will probably have mixed expired and unexpired data.

Re: What % of cassandra developers are employed by Datastax?

2014-05-16 Thread Kevin Burton
so 30%… according to that data. On Thu, May 15, 2014 at 4:59 PM, Michael Shuler mich...@pbandjelly.orgwrote: On 05/14/2014 03:39 PM, Kevin Burton wrote: I'm curious what % of cassandra developers are employed by Datastax? http://wiki.apache.org/cassandra/Committers -- Kind regards,