Re: Unit Testing Cassandra

2013-06-19 Thread Stephen Connolly
Unit testing means testing in isolation the smallest part. Unit tests should not take more than a few milliseconds to set up and verify their assertions. As such, if your code is not factored well for testing, you would typically use mocking (either by hand, or with mocking libraries) to mock

Re: Reduce Cassandra GC

2013-06-19 Thread Joel Samuelsson
My Cassandra ps info: root 26791 1 0 07:14 ?00:00:00 /usr/bin/jsvc -user cassandra -home /opt/java/64/jre1.6.0_32/bin/../ -pidfile /var/run/cassandra.pid -errfile 1 -outfile /var/log/cassandra/output.log -cp

Rolling upgrade from 1.1.12 to 1.2.5 visibility issue

2013-06-19 Thread Polytron Feng
Hi, We are trying to roll upgrade from 1.0.12 to 1.2.5, but we found that the 1.2.5 node cannot see other old nodes. Therefore, we tried to upgrade to 1.1.12 first, and it works. However, we still saw the same issue when rolling upgrade from 1.1.12 to 1.2.5. This seems to be the fixed issue as

Re: Reduce Cassandra GC

2013-06-19 Thread Takenori Sato
GC options are not set. You should see the followings. -XX:+PrintGCDateStamps -XX:+PrintPromotionFailure -Xloggc:/var/log/cassandra/gc-1371603607.log Is it normal to have two processes like this? No. You are running two processes. On Wed, Jun 19, 2013 at 4:16 PM, Joel Samuelsson

Re: Reduce Cassandra GC

2013-06-19 Thread Joel Samuelsson
Right, after getting the GC logging information I tested upgrading to 1.2. Didn't help but I forgot to reenable the GC options. No. You are running two processes. Ok, that's weird. I am using an unmodified version of a startup script in /etc/init.d/cassandra from the Debian package. Here's some

RE: Data not fully replicated with 2 nodes and replication factor 2

2013-06-19 Thread James Lee
The test tool I am using catches any exceptions on the original writes and resubmits the write request until it's successful (bailing out after 5 failures). So for each key Cassandra has reported a successful write. Nodetool says the following - I'm guessing the pending hinted handoff is the

Re: Reduce Cassandra GC

2013-06-19 Thread Fabrice Facorat
2013/6/19 Takenori Sato ts...@cloudian.com: GC options are not set. You should see the followings. -XX:+PrintGCDateStamps -XX:+PrintPromotionFailure -Xloggc:/var/log/cassandra/gc-1371603607.log Is it normal to have two processes like this? No. You are running two processes. It's normal

TTL can't be speciefied at column level using CQL 3 in Cassandra 1.2.x

2013-06-19 Thread Amresh Kumar Singh
Hi, Using Thrift, we are allowed to specify different TTL values for each columns in a row. But CQL3 doesn't provide a way for this. For instance, this is allowed: INSERT INTO users (user_name, password, gender, state) VALUES ('xamry2, 'aa', 'm', 'UP') using TTL 5; But something

Re: TTL can't be speciefied at column level using CQL 3 in Cassandra 1.2.x

2013-06-19 Thread Sylvain Lebresne
Hi, But CQL3 doesn't provide a way for this. That's not true. But the syntax is probably a bit more verbose than what you were hoping for. Your example (where I assume user_name is you partition key) can be achieved with: BEGIN BATCH UPDATE users SET password = 'aa' WHERE

RE: TTL can't be speciefied at column level using CQL 3 in Cassandra 1.2.x

2013-06-19 Thread Amresh Kumar Singh
Thanks Sylvian, I am working on a high level client (Kundera) which, if users want, should be able to achieve this, even if that's uncommon. Writing Update Batch CQL is an approach that works, as you are saying performance is not impacted. In my opinion, an *optional* USING TTL with column

Real Use Cases in Cassandra !!!

2013-06-19 Thread varadarajan . v
Team, Can anyone share real use cases in Cassandra? Thanks Regards, Varada Solution Architect/Business Information Management Services Practice Polaris Financial Technology Limited 6th Floor, West Wing, Nxt lvl, Navalur W:044-33418000*8613 M:9791700984 : VOIP:90-8613

RE: Real Use Cases in Cassandra !!!

2013-06-19 Thread Romain HARDOUIN
Hi, Have a look at DataStax's customers: http://www.datastax.com/customers varadaraja...@polarisft.com a écrit sur 19/06/2013 12:48:50 : De : varadaraja...@polarisft.com A : user@cassandra.apache.org, Date : 19/06/2013 12:49 Objet : Real Use Cases in Cassandra !!! Team, Can

Re: Real Use Cases in Cassandra !!!

2013-06-19 Thread Elliot Thompson
Visit Planet cassandra website.. hosted by datastax.. On 19 Jun 2013, at 13:21, Romain HARDOUIN romain.hardo...@urssaf.fr wrote: Hi, Have a look at DataStax's customers: http://www.datastax.com/customers varadaraja...@polarisft.com a écrit sur 19/06/2013 12:48:50 : De :

Re: nodetool ring showing different 'Load' size

2013-06-19 Thread Rodrigo Felix
Thanks Eric. Is there a way to start manually compaction operations? I'm thinking about doing after loading data and before start run phase of the benchmark. Thanks. Att. *Rodrigo Felix de Almeida* LSBD - Universidade Federal do Ceará Project Manager MBA, CSM, CSPO, SCJP On Mon, Jun 17, 2013

Re: Dropped mutation messages

2013-06-19 Thread Shahab Yunus
Hello Arthur, What do you mean by The queries need to be lightened? Thanks, Shahb On Tue, Jun 18, 2013 at 8:47 PM, Arthur Zubarev arthur.zuba...@aol.comwrote: Cem hi, as per http://wiki.apache.org/cassandra/FAQ#dropped_messages Internode messages which are received by a node, but do

Re: Unit Testing Cassandra

2013-06-19 Thread Shahab Yunus
Thanks Stephen for you reply and explanation. My bad that I mixed those up and wasn't clear enough. Yes, I have different 2 requests/questions. 1) One is for the unit testing. 2) Second (in which I am more interested in) is for performance (stress/load) testing. Let us keep integration aside for

token() function in CQL3 (1.2.5)

2013-06-19 Thread Ben Boule
Can anyone explain this to me? I have been looking through the source code but can't seem to find the answer. The documentation mentions using the token() function to change a value into it's token for use in queries. It always mentions it as taking a single parameter: SELECT * FROM posts

Re: nodetool ring showing different 'Load' size

2013-06-19 Thread Michal Michalski
You can start compaction via JMX if you need it and you know what you're doing: Find org.apache.cassandra.db:type=CompactionManager MBean and forceUserDefinedCompaction operation in it. First argument is keyspace name, second one is a comma-separated list of SSTables to compact (filename) You

Re: Unit Testing Cassandra

2013-06-19 Thread Hiller, Dean
For unit testing, we actually use PlayOrm which has an in-memory version of nosql so we just write unit tests against our code which uses the in-memory version but that is only if you are in java. Later, Dean From: Shahab Yunus shahab.yu...@gmail.commailto:shahab.yu...@gmail.com Reply-To:

RE: Unit Testing Cassandra

2013-06-19 Thread Ben Boule
Hi Shabab, Cassandra-Unit has been helpful for us for running unit tests without requiring a real cassandra instance to be running. We only use this to test our DAO code which interacts with the Cassandra client. It basically starts up an embedded instance of cassandra and fools your

timeuuid and cql3 query

2013-06-19 Thread Ryan, Brent
I'm experimenting with a data model that will need to ingest a lot of data that will need to be query able by time. In the example below, I want to be able to run a query like select * from count3 where counter = 'test' and ts minTimeuuid('2013-06-18 16:23:00') and ts minTimeuuid('2013-06-18

Re: Unit Testing Cassandra

2013-06-19 Thread Edward Capriolo
You really do not need much in java you can use the embedded server. Hector wrap a simple class around thiscalled EmbeddedServerHelper On Wednesday, June 19, 2013, Ben Boule ben_bo...@rapid7.com wrote: Hi Shabab, Cassandra-Unit has been helpful for us for running unit tests without requiring

Re: vnodes ready for production ?

2013-06-19 Thread Jim Ancona
On Tue, Jun 18, 2013 at 4:04 AM, aaron morton aa...@thelastpickle.com wrote: Even more if we could automate some up-scale thanks to AWS alarms, It would be awesome. I saw a demo for Priam (https://github.com/Netflix/Priam) doing that at netflix in March, not sure if it's public yet. Are the

DC dedicated to Hadoop jobs

2013-06-19 Thread cscetbon.ext
Hi, Our Hadoop jobs will only do READs and we want to restrict reads in this dedicated DC even if performances are bad. What can we do to achieve this goal ? - set dynamic_snitch_badness_threshold to 0.98 on these DC's nodes ? can we have different dynamic_snitch_badness_threshold values on

Re: timeuuid and cql3 query

2013-06-19 Thread Tyler Hobbs
On Wed, Jun 19, 2013 at 8:08 AM, Ryan, Brent br...@cvent.com wrote: CREATE TABLE count3 ( counter text, ts timeuuid, key1 text, value int, PRIMARY KEY ((counter, ts)) ) Instead of doing a composite partition key, remove a set of parens and let ts be your clustering key. That

Re: token() function in CQL3 (1.2.5)

2013-06-19 Thread Tyler Hobbs
On Wed, Jun 19, 2013 at 7:47 AM, Ben Boule ben_bo...@rapid7.com wrote: Can anyone explain this to me? I have been looking through the source code but can't seem to find the answer. The documentation mentions using the token() function to change a value into it's token for use in queries.

Re: timeuuid and cql3 query

2013-06-19 Thread Davide Anastasia
Hi Tyler, I am interested in this scenario as well: could you please elaborate further your answer? Thanks a lot, Davide On 19 Jun 2013 16:01, Tyler Hobbs ty...@datastax.com wrote: On Wed, Jun 19, 2013 at 8:08 AM, Ryan, Brent br...@cvent.com wrote: CREATE TABLE count3 ( counter text,

Re: timeuuid and cql3 query

2013-06-19 Thread Sylvain Lebresne
You're using the ordered partitioner, right? On Wed, Jun 19, 2013 at 5:06 PM, Davide Anastasia davide.anasta...@gmail.com wrote: Hi Tyler, I am interested in this scenario as well: could you please elaborate further your answer? Thanks a lot, Davide On 19 Jun 2013 16:01, Tyler Hobbs

Re: timeuuid and cql3 query

2013-06-19 Thread Ryan, Brent
I'm using the byte ordered partitioner. Sent from my iPhone On Jun 19, 2013, at 11:26 AM, Sylvain Lebresne sylv...@datastax.commailto:sylv...@datastax.com wrote: You're using the ordered partitioner, right? On Wed, Jun 19, 2013 at 5:06 PM, Davide Anastasia

Re: Reduce Cassandra GC

2013-06-19 Thread Mohit Anchlia
How much data do you have per node? How much RAM per node? How much CPU per node? What is the avg CPU and memory usage? On Wed, Jun 19, 2013 at 12:16 AM, Joel Samuelsson samuelsson.j...@gmail.com wrote: My Cassandra ps info: root 26791 1 0 07:14 ?00:00:00 /usr/bin/jsvc

Re: timeuuid and cql3 query

2013-06-19 Thread Ryan, Brent
Tyler, You're recommending this schema instead, correct? CREATE TABLE count3 ( counter text, ts timeuuid, key1 text, value int, PRIMARY KEY (ts, counter) ) I believe I tried this as well and ran into similar problems but I'll try it again. I'm using the ByteOrderedPartitioner if

Re: timeuuid and cql3 query

2013-06-19 Thread Ryan, Brent
Here's an example of that not working: cqlsh:Test desc table count4; CREATE TABLE count4 ( ts timeuuid, counter text, key1 text, value int, PRIMARY KEY (ts, counter) ) WITH bloom_filter_fp_chance=0.01 AND caching='KEYS_ONLY' AND comment='' AND

Re: timeuuid and cql3 query

2013-06-19 Thread Ryan, Brent
Note that it seems to work when you structure your schema in this example below, BUT this is a problem because all of my data will wind up hitting a single node in my cassandra cluster because the partitioning key is counter and that isn't unique enough. I was hoping that I wasn't going to

Date range queries

2013-06-19 Thread Christopher J. Bottaro
Hello, We are considering using Cassandra and I want to make sure our use case fits Cassandra's strengths. We have the table like: answers --- user_id | question_id | result | created_at Where our most common query will be something like: SELECT * FROM answers WHERE user_id = 123 AND

Re: nodetool ring showing different 'Load' size

2013-06-19 Thread Robert Coli
On Wed, Jun 19, 2013 at 5:47 AM, Michal Michalski mich...@opera.com wrote: You can also perform a major compaction via nodetool compact (for SizeTieredCompaction), but - again - you really should not do it unless you're really sure what you do, as it compacts all the SSTables together, which

Re: Date range queries

2013-06-19 Thread David McNelis
I think you'd just be better served with just a little different primary key. If your primary key was (user_id, created_at) or (user_id, created_at, question_id), then you'd be able to run the above query without a problem. This will mean that the entire pantheon of a specific user_id will be

Joining distinct clusters with the same schema together

2013-06-19 Thread Faraaz Sareshwala
My company is planning on deploying cassandra to three separate datacenters. Each datacenter will have a cassandra cluster with a separate set of seeds specific to that datacenter. However, the cluster name will be the same. Question 1: is this enough to guarentee that the three datacenters will

Re: timeuuid and cql3 query

2013-06-19 Thread Sylvain Lebresne
So part of it is a bug, namely https://issues.apache.org/jira/browse/CASSANDRA-5666. In summary CQL3 should not accept: ts minTimeuuid('2013-06-17 22:36:16') and ts minTimeuuid('2013-06-20 22:44:02'), because it does no know how to handle it properly. What it should support is token(ts)

Re: Joining distinct clusters with the same schema together

2013-06-19 Thread Robert Coli
On Wed, Jun 19, 2013 at 10:50 AM, Faraaz Sareshwala fsareshw...@quantcast.com wrote: Each datacenter will have a cassandra cluster with a separate set of seeds specific to that datacenter. However, the cluster name will be the same. Question 1: is this enough to guarentee that the three

Re: Heap is not released and streaming hangs at 0%

2013-06-19 Thread Wei Zhu
If you want, you can try to force the GC through Jconsole. Memory-Perform GC. It theoretically triggers a full GC and when it will happen depends on the JVM -Wei - Original Message - From: Robert Coli rc...@eventbrite.com To: user@cassandra.apache.org Sent: Tuesday, June 18, 2013

Re: Joining distinct clusters with the same schema together

2013-06-19 Thread Eric Stevens
On its face my answer is not... really? What do you view yourself as getting with this technique versus using built in replication? As an example, you lose the ability to do LOCAL_QUORUM vs EACH_QUORUM consistency level operations? Doing replication manually sounds like a recipe for the

Re: Data not fully replicated with 2 nodes and replication factor 2

2013-06-19 Thread Wei Zhu
You have a lot of Dropped Mutations which means those writes might not go through. Since you have CL.ONE as write consistency, your client doesn't see the exception if write fails only on one node. I think hints are only stored when the other node is down, not on the dropped mutations.

Re: timeuuid and cql3 query

2013-06-19 Thread Francisco Andrades Grassi
Hi, I believe what he's recommending is: CREATE TABLE count3 ( counter text, ts timeuuid, key1 text, value int, PRIMARY KEY (counter, ts) ) That way counter will be your partitioning key, and all the rows that have the same counter value will be clustered (stored as a single wide row

Re: Date range queries

2013-06-19 Thread Christopher J. Bottaro
Interesting, thank you for the reply. Two questions though... Why should created_at come before question_id in the primary key? In other words, why (user_id, created_at, question_id) instead of (user_id, question_id, created_at)? Given this setup, all a user's answers (all 10k) will be stored

Re: Date range queries

2013-06-19 Thread David McNelis
So, if you want to grab by the created_at and occasionally limit by question id, that is why you'd use created_at. The way the primary keys work is the first part of the primary key is the Partioner key, that field is what essentially is the single cassandra row. The second key is the order

Re: Data not fully replicated with 2 nodes and replication factor 2

2013-06-19 Thread Robert Coli
On Wed, Jun 19, 2013 at 11:43 AM, Wei Zhu wz1...@yahoo.com wrote: I think hints are only stored when the other node is down, not on the dropped mutations. (Correct me if I am wrong, actually it's not a bad idea to store hints for dropped mutations and replay them later?) This used to be the

error on startup: unable to find sufficient sources for streaming range

2013-06-19 Thread Faraaz Sareshwala
Hi, I couldn't find any information on the following error so I apologize if it has already been discussed. On some of my nodes, I'm getting the following exception when cassandra starts up: 2013-06-19 22:17:39.480414500 Exception encountered during startup: unable to find sufficient sources

Performance Difference between Cassandra version

2013-06-19 Thread Raihan Jamal
I am trying to see whether there will be any performance difference between Cassandra 1.0.8 vs Cassandra 1.2.2 for reading the data mainly? Has anyone seen any major performance difference?

Re: Performance Difference between Cassandra version

2013-06-19 Thread Franc Carter
On Thu, Jun 20, 2013 at 9:18 AM, Raihan Jamal jamalrai...@gmail.com wrote: I am trying to see whether there will be any performance difference between Cassandra 1.0.8 vs Cassandra 1.2.2 for reading the data mainly? Has anyone seen any major performance difference? We are part way through a

Re: Data not fully replicated with 2 nodes and replication factor 2

2013-06-19 Thread Wei Zhu
Rob, Thanks. I was not aware of that. So we can avoid repair if there is no hardware failure...I found a blog: http://www.datastax.com/dev/blog/modern-hinted-handoff -Wei - Original Message - From: Robert Coli rc...@eventbrite.com To: user@cassandra.apache.org, Wei Zhu

Re: Unit Testing Cassandra

2013-06-19 Thread Shahab Yunus
Thanks Edward, Ben and Dean for the pointers. Yes, I am using Java and these sounds promising for unit testing, at least. Regards, Shahab On Wed, Jun 19, 2013 at 9:58 AM, Edward Capriolo edlinuxg...@gmail.comwrote: You really do not need much in java you can use the embedded server. Hector