Re: Is there any way to fetch all data efficiently from a column family?

2013-01-30 Thread dong.yajun
Thanks Michael. I will make a benchmark using Hadoop Map/Reduce(example...) in our cluster. and any valuable information I will let you know. :) Best, On Wed, Jan 30, 2013 at 2:39 PM, Michael Kjellman mkjell...@barracuda.comwrote: And finally to make wide rows with C* and Hadoop even

how RandomPartitioner calculate tokens

2013-01-30 Thread Manu Zhang
Hi, As per the Datastax Cassandra Documentation 1.2, for single data center deployments, tokens are calculated by dividing the hash range by the number of nodes in the cluster, *does it mean we have to recalculate the tokens of keys when nodes come and go?** * for multiple data center

Re: how RandomPartitioner calculate tokens

2013-01-30 Thread Sylvain Lebresne
I'll admit that this part of the DataStax documentation is a bit confusing (and I'll reach to the doc writers to make sure this is improved). The partitioner (being it RandomPartitioner, Murmur3Partitioner or OrderPreservingPartitioner) is pretty much only a hash function that defines how to

Re: JDBC, Select * Cql2 vs Cql3 problem ?

2013-01-30 Thread Andy Cobley
Well this is getting stranger, for me with this simple table definition, select key,gender from users is also failing with a null pointer exception Andy On 29 Jan 2013, at 13:50, Andy Cobley acob...@computing.dundee.ac.uk wrote: When connecting to Cassandra 1.2.0 from CQLSH the table was

Re: how RandomPartitioner calculate tokens

2013-01-30 Thread Manu Zhang
On Wed 30 Jan 2013 05:47:59 PM CST, Sylvain Lebresne wrote: I'll admit that this part of the DataStax documentation is a bit confusing (and I'll reach to the doc writers to make sure this is improved). The partitioner (being it RandomPartitioner, Murmur3Partitioner or

Multiple Data Center Clusters on Cassandra

2013-01-30 Thread adeel . akbar
Hi, I am running 3 nodes cassandra cluster with replica factor 2 in one DC. Now I need to run multiple data center clusters with cassandra and I have following queries; 1. I want to replicate whole data on another DC and after that both DC's nodes should have complete Data. In which

RE: cryptic exception in Hadoop/Cassandra job

2013-01-30 Thread Pieter Callewaert
Hi Brian, Which version of cassandra are you using? And are you using the BOF to write to Cassandra? Kind regards, Pieter -Original Message- From: Brian Jeltema [mailto:brian.jelt...@digitalenvoy.net] Sent: woensdag 30 januari 2013 13:20 To: user@cassandra.apache.org Subject: cryptic

Re: Multiple Data Center Clusters on Cassandra

2013-01-30 Thread Vivek Mishra
1. I want to replicate whole data on another DC and after that both DC's nodes should have complete Data. In which topology is it possible ? I think NetworkTopology is best suited for such configuration, You may want to use nodetool to generate token accordingly. 2. If I need backup, what's the

Re: cryptic exception in Hadoop/Cassandra job

2013-01-30 Thread Brian Jeltema
Cassandra 1.1.5, using BulkOutputFormat Brian On Jan 30, 2013, at 7:39 AM, Pieter Callewaert wrote: Hi Brian, Which version of cassandra are you using? And are you using the BOF to write to Cassandra? Kind regards, Pieter -Original Message- From: Brian Jeltema

Re: Start token sorts after end token

2013-01-30 Thread Edward Capriolo
This was unexpected fallout fro the change to murmur partitioner. A jira is open but if you need map red murmers is currently out of the question. On Wednesday, January 30, 2013, Tejas Patil tejas.patil...@gmail.com wrote: While reading data from Cassandra in map-reduce, I am getting

Re: Start token sorts after end token

2013-01-30 Thread Edward Capriolo
Fix is simply to switch to random partitioner. On Wednesday, January 30, 2013, Edward Capriolo edlinuxg...@gmail.com wrote: This was unexpected fallout fro the change to murmur partitioner. A jira is open but if you need map red murmers is currently out of the question. On Wednesday, January

Re: JDBC, Select * Cql2 vs Cql3 problem ?

2013-01-30 Thread Edward Capriolo
You really can't mix cql2 and cql3. Cql2 does not understand cql3s sparse tables. Technically it ,barfs all over the place. Cql2 is only good for contact tables. On Wednesday, January 30, 2013, Andy Cobley acob...@computing.dundee.ac.uk wrote: Well this is getting stranger, for me with this

RE: cryptic exception in Hadoop/Cassandra job

2013-01-30 Thread Pieter Callewaert
I have the same issue (but with sstableloaders). Should be fixed in 1.2 release (https://issues.apache.org/jira/browse/CASSANDRA-4813) Kind regards, Pieter -Original Message- From: Brian Jeltema [mailto:brian.jelt...@digitalenvoy.net] Sent: woensdag 30 januari 2013 13:58 To:

Re: JDBC, Select * Cql2 vs Cql3 problem ?

2013-01-30 Thread Edward Capriolo
Darn auto correct cql2 , is only good for compact tables. Make sure you are setting you cql version. Or frankly just switch to Hector / thrift and use things that are know to work for years now. On Wednesday, January 30, 2013, Edward Capriolo edlinuxg...@gmail.com wrote: You really can't mix

Re: Node selection when both partition key and secondary index field constrained?

2013-01-30 Thread Edward Capriolo
Any query is going to fail quorum + rf3 + 2 nodes down. One thing about 2x indexes (both user defined and built in) is that finding an answer using them requires more nodes to be up then just a single get or slice. On Monday, January 28, 2013, Mike Sample mike.sam...@gmail.com wrote: Thanks

Re: Node selection when both partition key and secondary index field constrained?

2013-01-30 Thread Hiller, Dean
I recall someone doing some work in Astyanax and I don't know if it made it back in where astyanax would retry at a lower CL level when 2 nodes were down so things could continue to work which was a VERY VERY cool feature. You may want to look into that….I know at some point, I plan to.

Re: cryptic exception in Hadoop/Cassandra job

2013-01-30 Thread Brian Jeltema
I'm not sure this is the same problem. I'm getting these even when using a single reducer for the entire job. Brian On Jan 30, 2013, at 9:26 AM, Pieter Callewaert wrote: I have the same issue (but with sstableloaders). Should be fixed in 1.2 release

Re: Poor key cache hit rate

2013-01-30 Thread Edward Capriolo
You should not use the row cache and the key vacumed on the same cf. If that is what you are doing it explains your numbers. Some docs suggest you can use them together but in practice I have seen when this is done the key cache rate drops to near 0. On Tuesday, January 29, 2013, Keith

Re: Node selection when both partition key and secondary index field constrained?

2013-01-30 Thread Edward Capriolo
Hector has this feature because Hector is awesome sauce, but aystynsnax is new,sexy, and bogged about by netflix. So the new cassandra trend to force everyone to use less functional new stuff is at work here making you wish for something that already exists elsewhere. On Wednesday, January 30,

Re: Node selection when both partition key and secondary index field constrained?

2013-01-30 Thread Peter Lin
I'd also point out, Hector has better support for CQL3 features than Astyanax. I contributed some stuff to hector back in December, but I don't have time to apply those changes to astyanax. I have other contributions in mind for hector, which I hope to work on later this year. On Wed, Jan 30,

Nodetool can not get to 7199 after migrating to 1.2.1

2013-01-30 Thread Shahryar Sedghi
I migrated my test environment from 1.2.0 to 1.2.1 (DataStax Community) and nodetool can not communicate to 7199, even if it is listening. in one node I get Failed to connect to 'cassandra4:7199': Connection refused in another node I get timeout. Did I do anything wrong, when upgrading? Thanks

Suggestion: Move some threads to the client-dev mailing list

2013-01-30 Thread Edward Capriolo
A good portion of people and traffic on this list is questions about: 1) asytnax 2) cassandra-jdbc 3) cassandra native client 3) pyhtondra / whatever With the exception of the native transport which is only half way part of Cassandra, none of the these other client issues have much to do with

Re: Understanding Virtual Nodes on Cassandra 1.2

2013-01-30 Thread Manu Zhang
On Wed 30 Jan 2013 02:29:27 AM CST, Zhong Li wrote: One more question, can I add a virtual node manually without reboot and rebuild a host data? I checked nodetool command, there is no option to add a node. Thanks. Zhong On Jan 29, 2013, at 11:09 AM, Zhong Li wrote: I was misunderstood

Re: Suggestion: Move some threads to the client-dev mailing list

2013-01-30 Thread Vivek Mishra
I totally agree. -Vivek On Wed, Jan 30, 2013 at 8:51 PM, Edward Capriolo edlinuxg...@gmail.comwrote: A good portion of people and traffic on this list is questions about: 1) asytnax 2) cassandra-jdbc 3) cassandra native client 3) pyhtondra / whatever With the exception of the native

Re: Inserting via thrift interface to column family created with Compound Key via cql3

2013-01-30 Thread Michael Kjellman
Are you using execute_cql3_query() ? On Jan 30, 2013, at 7:31 AM, Oleksandr Petrov oleksandr.pet...@gmail.com wrote: Hi, I'm creating a table via cql3 query like: CREATE TABLE posts ( userid text, blog_name text, entry_title text, posted_at text, PRIMARY KEY (userid,

Re: Inserting via thrift interface to column family created with Compound Key via cql3

2013-01-30 Thread Oleksandr Petrov
Yes, execute_cql3_query, exactly. On Wed, Jan 30, 2013 at 4:37 PM, Michael Kjellman mkjell...@barracuda.comwrote: Are you using execute_cql3_query() ? On Jan 30, 2013, at 7:31 AM, Oleksandr Petrov oleksandr.pet...@gmail.com wrote: Hi, I'm creating a table via cql3 query like:

Re: Uneven CPU load on a 4 node cluster

2013-01-30 Thread Jabbar
The high CPU node got replaced and now I'm not getting abnormally high CPU from one node. They all are evenly balanced now. On 29 January 2013 16:29, Jabbar aja...@gmail.com wrote: Hello, I've been testing a four identical node cassanda 1.2 cluster for a number of days. I have written a c#

Re: Inserting via thrift interface to column family created with Compound Key via cql3

2013-01-30 Thread Michael Kjellman
Did you pack the composite correctly? This exception normally shows up when the composite bytes are malformed On Jan 30, 2013, at 7:45 AM, Oleksandr Petrov oleksandr.pet...@gmail.commailto:oleksandr.pet...@gmail.com wrote: Yes, execute_cql3_query, exactly. On Wed, Jan 30, 2013 at 4:37 PM,

Re: Inserting via thrift interface to column family created with Compound Key via cql3

2013-01-30 Thread Michael Kjellman
From src/java/org/apache/cassandra/db/marshal/CompositeType.java /* * The encoding of a CompositeType column name should be: * componentcomponentcomponent ... * where component is: * length of valuevalue'end-of-component' byte * where length of value is a 2 bytes unsigned short the and

Chronos - Timeseries with Hector

2013-01-30 Thread Dan Simpson
Hello, I recently open sourced a WIP java library for handling timestamped data. I am looking for feedback/criticism and also interest. It was made primarily to process lots of small numeric values, without having to load the entire set into memory. Anyways, thoughts and feedback appreciated.

Re: Chronos - Timeseries with Hector

2013-01-30 Thread Dan Simpson
I'm sure it helps if I link the thing: https://github.com/dansimpson/chronos On Wed, Jan 30, 2013 at 8:39 AM, Dan Simpson dan.simp...@gmail.com wrote: Hello, I recently open sourced a WIP java library for handling timestamped data. I am looking for feedback/criticism and also interest. It

Re: Understanding Virtual Nodes on Cassandra 1.2

2013-01-30 Thread Zhong Li
You add a physical node and that in turn adds num_token tokens to the ring. No, I am talking about Virtual Nodes with order preserving partitioner. For an existing host with multiple tokens setting list on cassandra.inital_token. After initial bootstrapping, the host will not aware changes of

Re: Upcoming conferences

2013-01-30 Thread Brian Tarbox
At what level will the NY talks be? I had been planning on attending Datastax's big summer conference and I might not be able to get approval for bothso I'd like to hear more about this one. On Wed, Jan 30, 2013 at 12:40 PM, Jonathan Ellis jbel...@gmail.com wrote: ApacheCon North America

Re: Suggestion: Move some threads to the client-dev mailing list

2013-01-30 Thread Rob Coli
On Wed, Jan 30, 2013 at 7:21 AM, Edward Capriolo edlinuxg...@gmail.com wrote: My suggestion: At minimum we should re-route these questions to client-dev or simply say, If it is not part of core Cassandra, you are looking in the wrong place for support +1, I find myself scanning past all those

RE: cluster issues

2013-01-30 Thread S C
I am using DseDelegateSnitch Thanks,SC From: aa...@thelastpickle.com Subject: Re: cluster issues Date: Tue, 29 Jan 2013 20:15:45 +1300 To: user@cassandra.apache.org We can always be proactive in keeping the time sync. But, Is there any way to recover from a time drift (in a reactive manner)?

Re: Understanding Virtual Nodes on Cassandra 1.2

2013-01-30 Thread Zhong Li
Are there tickets/documents explain how data be replicated on Virtual Nodes? If there are multiple tokens on one physical host, may a chance two or more tokens chosen by replication strategy located on same host? If move/remove/add a token manually, does Cassandra Engine validate the case?

Re: too many warnings of Heap is full

2013-01-30 Thread Bryan Talbot
My guess is that those one or two nodes with the gc pressure also have more rows in your big CF. More rows could be due to imbalanced distribution if your'e not using a random partitioner or from those nodes not yet removing deleted rows which other nodes may have done. JVM heap space is used

Re: too many warnings of Heap is full

2013-01-30 Thread Nate McCall
What's the output of nodetool cfstats for those 2 column families on cassNode2 and cassNode3? And what is the replication factor for this cluster? Per the previous reply, nodetool ring should show each of your nodes with ~16.7% of the data if well balanced. Also, the auto-detection for memory

CASSANDRA-5152

2013-01-30 Thread Yen-Fen_Hsu
I had the same problem with 1.2.0. The problem went away after readline was easy-installed. Regards, Yen-Fen Hsu

Re: SStable Writer and composite key

2013-01-30 Thread aaron morton
This is what a row of your table will look like internally… --- RowKey: id-value = (column=date-value:request-value:, value=, timestamp=1359586739456000) = (column=date-value:request-value:data1, value=64617461312d76616c7565, timestamp=1359586739456000) =

Re: Problem on node join the ring

2013-01-30 Thread aaron morton
erg, that error means it's not really part of the ring. I would try to restart the joining. Shut down the node, and delete everything in /var/lib/data/system. You can leave the data that's already there if you want or delete it. Then try joining again. Cheers - Aaron

Re: too many warnings of Heap is full

2013-01-30 Thread Nate McCall
Your latencies and distribution look fine. How big/what types of queries are you issuing? Are you issuing a lot of large multigets? Also, do either of these column families have secondary indexes? On Wed, Jan 30, 2013 at 2:59 PM, Guillermo Barbero guillermo.barb...@spotbros.com wrote: Iep,

Re: Cass returns Incorrect column data on writes during flushing

2013-01-30 Thread aaron morton
The looks bug like, can you create a ticket on https://issues.apache.org/jira/browse/CASSANDRA Please include the C* version, the table and insert statements, and if you can repo is using CQL 3. Thanks Aaron - Aaron Morton Freelance Cassandra Developer New Zealand

Re: why set replica placement strategy at keyspace level ?

2013-01-30 Thread aaron morton
I think a row mutation is isolated now, but is it across column families? Correct they are isolated, but only for an individual CF. By the way, the wiki page really needs updating. You can update if you would like to. Cheers - Aaron Morton Freelance Cassandra Developer New

Re: why set replica placement strategy at keyspace level ?

2013-01-30 Thread Manu Zhang
On Thu 31 Jan 2013 08:55:40 AM CST, aaron morton wrote: I think a row mutation is isolated now, but is it across column families? Correct they are isolated, but only for an individual CF. By the way, the wiki page really needs updating. You can update if you would like to. Cheers

Re: why set replica placement strategy at keyspace level ?

2013-01-30 Thread Edward Capriolo
That should not bother you. For example, if your doing an hbase scan that crosses two column families, that count end up being two (disk) seeks. Having an API that hides the seeks from you does not give you better performance, it only helps you when your debating with people that do not

Re: Cassandra pending compaction tasks keeps increasing

2013-01-30 Thread Wei Zhu
Some updates: Since we still have not fully turned on the system. We did something crazy today. We tried to treat the node as dead one. (My boss wants us to practice replacing a dead node before going to full production) and boot strap it. Here is what we did: * drain the node *

CPU hotspot at BloomFilterSerializer#deserialize

2013-01-30 Thread Takenori Sato
Hi all, We have a situation that CPU loads on some of our nodes in a cluster has spiked occasionally since the last November, which is triggered by requests for rows that reside on two specific sstables. We confirmed the followings(when spiked): version: 1.0.7(current) - 0.8.6 - 0.8.5 - 0.7.8

Error when using CQL driver : No indexed columns present in by-columns clause with equals operator

2013-01-30 Thread Dinusha Dilrukshi
Hi All, I have created a column family as follows. (With secondary indexes.) create column family users with comparator=UTF8Type and key_validation_class = 'UTF8Type' and default_validation_class = 'UTF8Type' and column_metadata=[{column_name: full_name, validation_class: UTF8Type},