Re: two dimensional slicing

2012-01-23 Thread aaron morton
It depends a bit on the data and the query patterns. * How many versions do you have ? * How many names in each version ? * When querying do you know the versions numbers you want to query from ? How many are there normally? * How frequent are the updates and the reads ? I would lean

Re: delay in data deleting in cassadra

2012-01-23 Thread aaron morton
also, *please* upgrade to the latest 0.8 release. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 21/01/2012, at 4:31 PM, Maxim Potekhin wrote: Did you run repairs withing GC_GRACE all the time? On 1/20/2012 3:42 AM, Shammi

Re: Get all keys from the cluster

2012-01-23 Thread aaron morton
If you want to keep the load out of the cassandra process and do the join to sql off line, take a look at the bin/sstablekeys utility. This will let you output the keys in an sstable. You will need to do it for every sstable on every node, create the unique list and then check in your SQL db

Re: delay in data deleting in cassadra

2012-01-23 Thread Shammi Jayasinghe
On Fri, Jan 20, 2012 at 11:02 PM, Peter Schuller peter.schul...@infidyne.com wrote: The problem occurs when this thread is invoked for the second time. In that step , it returns some of data that i already deleted in the third step of the previous cycle. In order to get a guarantee

Re: Unbalanced cluster with RandomPartitioner

2012-01-23 Thread aaron morton
Setting a token outside of the partitioner range sounds like a bug. It's mostly an issue with the RP, but I guess a custom partitioner may also want to validate tokens are within a range. Can you report it to https://issues.apache.org/jira/browse/CASSANDRA Thanks - Aaron

Re: ideal cluster size

2012-01-23 Thread aaron morton
I second Peters point, big servers are not always the best. My experience (using spinning disks) is that 200 to 300 GB of live data load per node (including replicated data) is a sweet spot. Above this the time taken for compaction, repair, off node backups, node moves etc starts to be a pain.

Re: Data Model Question

2012-01-23 Thread aaron morton
1. regarding time slicing, if at any point of time I am interested in what happened in the last T minutes, then I will need to query more than one row of the DimentionUpdates, right? Yerp. Sometimes that's is what's needed. 2. What did you mean by You will also want to partition the list

Re: get all columns for a row

2012-01-23 Thread aaron morton
The columns are stored at the intersection of the row and the CF. So if you read all the columns for a row in a CF you are only getting those ones. Your hector code (using the range) looks correct to me. Have fun. Aaron - Aaron Morton Freelance Developer @aaronmorton

Re: delay in data deleting in cassadra

2012-01-23 Thread Shammi Jayasinghe
On Mon, Jan 23, 2012 at 2:10 PM, Shammi Jayasinghe sha...@wso2.com wrote: On Fri, Jan 20, 2012 at 11:02 PM, Peter Schuller peter.schul...@infidyne.com wrote: The problem occurs when this thread is invoked for the second time. In that step , it returns some of data that i already

Tips for using OrderedPartitioner

2012-01-23 Thread Tharindu Mathew
Hi, We use Cassandra in a way we always want to range slice queries. Because, of the tendency to create hotspots with OrderedPartioner we decided to use RandomPartitioner. Then we would use, a row as an index row, holding values of the other row keys of the CF. I feel this has become a burden

Re: CQL jdbc

2012-01-23 Thread Tamar Fraenkel
If I understand correctly this is due in Cassandra 1.1. Does anyone know when it is planned to be released? Thanks Tamar On January 23, 2012 at 11:13 AM Jawahar Prasad w3engine...@gmail.com wrote: Hi.. Yes there is. But just 2 days back, they have released a patch:

datastax opscenter authentication

2012-01-23 Thread Ramesh Natarajan
I am trying to integrate opscenter in our environment and I was wondering if we can use PAM authentication instead of a password file for opscenter authentication? thanks Ramesh

Re: CQL jdbc

2012-01-23 Thread Eric Evans
On Mon, Jan 23, 2012 at 8:40 AM, Alex Major al3...@gmail.com wrote: Based on current discussions it looks like it will be in C* 1.1, but won't be in the default cql package - you'll need to opt into cql3 driver as there are some incompatible BC changes and they want to give an easier migration.

architectural understanding of write operation node flow

2012-01-23 Thread Peter Dijkshoorn
Hi guys, I got an architectural question about how a write operation flows through the nodes. As far as I understand now, a client sends its write operation to whatever node it was set to use and if that node does not contain the data for this key K, then this node forwards the operation to the

Re: CQL jdbc

2012-01-23 Thread Alex Major
Think there's some confusion as Tamar has two emails in the same thread addressing two separate concerns. I was referring to the discussion over Composite Key support that Tamar quoted in his second email (the one that I replied to and quoted), not his first/original question about JDBC. Unless

Re: get all columns for a row

2012-01-23 Thread Tamar Fraenkel
Thanks. Tamar On January 23, 2012 at 11:24 AM aaron morton aa...@thelastpickle.com wrote: The columns are stored at the intersection of the row and the CF. So if you read all the columns for a row in a CF you are only getting those ones.    Your hector code (using the range) looks correct

Re: architectural understanding of write operation node flow

2012-01-23 Thread Daniel Doubleday
Your first thought was pretty much correct: 1. The node which is called by the client is the coordinator 2. The coordinator determines the nodes in the ring which can handle the request ordered by expected latency (via snitch). The coordinator may or may not be part of these nodes 3. Given the

Re: architectural understanding of write operation node flow

2012-01-23 Thread Daniel Doubleday
Ouch :-) you were asking write ... Well kind of similar 1. Coordinator calculates all nodes 2. If not enough (according to CL) nodes are alive it throughs unavailable 3. If nodes are down it writes and hh is enabled it writes a hint for that row 4. It sends write request to all nodes (including

Re: datastax opscenter authentication

2012-01-23 Thread Nick Bailey
Unfortunately the current method using a password file is the only option for authentication in OpsCenter at the moment. I've noted PAM authentication as a feature request though. On Mon, Jan 23, 2012 at 8:16 AM, Ramesh Natarajan rames...@gmail.com wrote: I am trying to integrate opscenter in

Hive + Cassandra tutorial

2012-01-23 Thread Tharindu Mathew
Hi, I'm trying to experiment with Hive using Data in Cassandra. Brisk looks good, but I'm interested in running the map reduce jobs on HDFS not on CFS. I'm taking a look at [1], but couldn't figure out how to run it with a Cassandra cluster. I was wondering is there a simple word count example

Re: Hive + Cassandra tutorial

2012-01-23 Thread Jeremy Hanna
Take a look at http://wiki.apache.org/cassandra/HadoopSupport and in the source download of cassandra there's a contrib/pig section that has a wordcount example. On Jan 23, 2012, at 1:16 PM, Tharindu Mathew wrote: Hi, I'm trying to experiment with Hive using Data in Cassandra. Brisk looks

Re: Hive + Cassandra tutorial

2012-01-23 Thread Tharindu Mathew
Hi Jeremy, Thanks for the reply. I was looking for a similar sample for Hive. I've already gone through the Pig sample. I probably wasn't clear about this in my initial mail. On Tue, Jan 24, 2012 at 12:55 AM, Jeremy Hanna jeremy.hanna1...@gmail.comwrote: Take a look at

atomicity of a row write

2012-01-23 Thread Guy Incognito
hi all, having read: http://wiki.apache.org/cassandra/FAQ#batch_mutate_atomic i would like some clarification: is a write to a single row key in a single column family atomic in the sense that i can do a batch mutate where i 1) write col 'A' to key 'B' 2) write 'col 'C' to key 'B' and

Return list by limit 1 which is NOT null

2012-01-23 Thread Eric Martell
HI,   I am trying to create a keys list which I will fetch the key and then delete the same key in the subsequent call. When I use the CLI list, It always returns first row due to tombstone. Is there a way I can specify to use the limit 1 and return NOT null value. Please let me now. Thanks

Re: CQL jdbc

2012-01-23 Thread Eric Evans
On Mon, Jan 23, 2012 at 10:49 AM, Alex Major al3...@gmail.com wrote: Think there's some confusion as Tamar has two emails in the same thread addressing two separate concerns. I was referring to the discussion over Composite Key support that Tamar quoted in his second email (the one that I

Re: Cassandra performance question

2012-01-23 Thread Jonathan Ellis
Can you elaborate on to what exactly you were testing on the Cassandra side? It sounds like what this post refers to as node encryption corresponds to enabling internode_encryption: all, but I couldn't guess what your client encryption is since Cassandra doesn't support that out of the box yet.

Re: Cassandra performance question

2012-01-23 Thread Chris Marino
Hi Jonathan, yes, when I say 'node encryption' I mean inter-Cassandra node encryption. When I say 'client encryption' I mean encrypted traffic from the Cassandra nodes to the clients. For these benchmarks we used the stress test client load generator. We ran test with no encryption, then with

Re: Hive + Cassandra tutorial

2012-01-23 Thread Tharindu Mathew
Any idea whether the hive functionality will be merged to the Cassandra source? On Tue, Jan 24, 2012 at 1:00 AM, Tharindu Mathew mcclou...@gmail.comwrote: Hi Jeremy, Thanks for the reply. I was looking for a similar sample for Hive. I've already gone through the Pig sample. I probably