Re: newbie question: how do I know the total number of rows of a cf?

2011-03-31 Thread Sheng Chen
I just found an estmateKeys() method of the ColumnFamilyStoreMBean. Is there any indication about how it works? Sheng 2011/3/28 Sheng Chen chensheng2...@gmail.com Hi all, I want to know how many records I am holding in Cassandra, just like count(*) in sql. What can I do ? Thank you. Sheng

Re: Naming issue on nodetool repair command

2011-03-31 Thread Peter Schuller
Woud you cassandra team think to add an alias name for nodetool repair command? That thought has crossed my mind lately too; particularly in one of the recent threads. The problem seems analogous to 'fsck', and the distinction between fully expected by-design behavior needing fsck/repair is

Re: Two column families or One super column family?

2011-03-31 Thread T Akhayo
Hi Aaron, Thank you for your reply, i appreciate the suggestions you made. Yesterday i managed to get everything (our main read) in one CF, with the use of a structure in a value like you suggested. Designing a new data model is different from what i'm used to, but if you keep in mind that you

Re: Using RowMutations with super columns

2011-03-31 Thread aaron morton
The CassandraBulkLoader example is written to use Super Columns, so seems odd. Do you have the rest of the error stack ? Aaron On 31 Mar 2011, at 04:54, George Ciubotaru wrote: Hello, I’m using CassandraBulkLoader.java

Re: add new data directory to cassandra

2011-03-31 Thread aaron morton
AFAIK Cassandra will just pick the directory with the most space. Also AFAIK using multiple directories should only be considered a safety valve to fix problems such as the one you describe see http://www.mail-archive.com/user@cassandra.apache.org/msg07874.html Aaron On 31 Mar 2011, at

unsuscribe

2011-03-31 Thread Dario Bravo
-- Darío Bravo

Re: Cassandra take a snapshot after a column family update

2011-03-31 Thread Roberto Bentivoglio
Ok, we'll do it for sure! Thanks, Roberto On 31 March 2011 14:56, aaron morton aa...@thelastpickle.com wrote: Next time it happens take a note of the snapshot folder, different processes name the folder differently. It may help track down what created the snapshot. Cheers Aaron On 31

changing replication strategy and effects on replica nodes

2011-03-31 Thread Jonathan Colby
From my understanding of replica copies, cassandra picks which nodes to replicate the data based on replication strategy, and those same replica partner nodes are always used according to token ring distribution. If you change the replication strategy, does cassandra pick new nodes to

Re: Two column families or One super column family?

2011-03-31 Thread Edward Capriolo
On Thu, Mar 31, 2011 at 3:52 AM, T Akhayo t.akh...@gmail.com wrote: Hi Aaron, Thank you for your reply, i appreciate the suggestions you made. Yesterday i managed to get everything (our main read) in one CF, with the use of a structure in a value like you suggested. Designing a new data

Re: How to determine if repair need to be run

2011-03-31 Thread Jonathan Colby
silly question, would every cassandra installation need to have manual repairs done on it? It would seem cassandra's read repair and regular compaction would take care of keeping the data clean. Am I missing something? On Mar 30, 2011, at 7:46 PM, Peter Schuller wrote: I just wanted to

Re: Working backwards from production to staging/dev

2011-03-31 Thread ian douglas
Thanks Edward, Anyone able to provide some answers for the other questions? On 03/26/2011 07:25 AM, Edward Capriolo wrote: On Fri, Mar 25, 2011 at 2:11 PM, ian douglasi...@armorgames.com wrote: On 03/25/2011 10:12 AM, Jonathan Ellis wrote: On Fri, Mar 25, 2011 at 11:59 AM, ian

Re: How to determine if repair need to be run

2011-03-31 Thread Peter Schuller
silly question, would every cassandra installation need to have manual repairs done on it? It would seem cassandra's read repair and regular compaction would take care of keeping the data clean. Am I missing something? See my previous posts in this thread for the distinct reasons to run

Re: How to determine if repair need to be run

2011-03-31 Thread mcasandra
If I am not wrong node repair need to be run on all the nodes in staggerred manner. It is required to take care of tombstones. Please correct me team if I am wrong :) See Distributed Deletes: http://wiki.apache.org/cassandra/Operations -- View this message in context:

pycassa refresh server_list

2011-03-31 Thread A J
In the pycassa.pool.ConnectionPool class, I can specify all the nodes in server_list parameter. But overtime, when nodes get decomissioned and new nodes with new IPs get added, how can the server_list parameter be refereshed ? Do I have to modify it manually, or is there a way to update the list

Netstats out of sync?

2011-03-31 Thread buddhasystem
I'm rebalancing a cluster of 2 nodes at this point. Netstats on the source node reports progress of the stream, whereas on the receving end netstats states that progress = 0. Did anyone see that? Do I need both nodes listed as seeds in cassandra.yaml? TIA/ -- View this message in context:

Re: pycassa refresh server_list

2011-03-31 Thread Tyler Hobbs
ConnectionPool has a set_server_list() method that you can use to update the list of servers. (It appears this method did not make it into the docs; I'll make sure it gets in there.) Pycassa doesn't make any attempt to update the server list automatically right now. By the way, there is a

Re: Revised: Data Modeling advise for Cassandra 0.8 (added #8)

2011-03-31 Thread Drew Kutcharian
Thanks Aaron, I have already checked out Twissandra. I was mainly looking to see how Secondary Indexes can be used and how they effect Data Modeling. There doesn't seem to be a lot of coverage on them. In addition, I couldn't tell what kind of Partitioner is Twissandra using and why. cheers,

Not able to set ZERO consistency level

2011-03-31 Thread Prasanna Rajaperumal
Hi, I am dealing with reporting with not so important data and I am okay with data being lost. I would like to minimize the time taken for the actual data insert. I am using Cassandra 0.7.4 If it matter, using Hector to connect to Cassandra cZERO consistency level in Thrift Generated code

Re: Not able to set ZERO consistency level

2011-03-31 Thread Peter Schuller
Only the following Levels are provided, I am wondering if the ZERO consistency level is removed in Cassandra 0.7.X ? Yes, it's gone. If so, Could you please explain why was it removed and what is the best option I have given my context. https://issues.apache.org/jira/browse/CASSANDRA-1607

Re: Not able to set ZERO consistency level

2011-03-31 Thread Edward Capriolo
On Thu, Mar 31, 2011 at 2:53 PM, Peter Schuller peter.schul...@infidyne.com wrote: Only the following Levels are provided, I am wondering if the ZERO consistency level is removed in Cassandra 0.7.X ? Yes, it's gone. If so, Could you please explain why was it removed and what is the best

Re: How to determine if repair need to be run

2011-03-31 Thread Jonathan Colby
Peter - Thanks a lot for elaborating on repairs.Still, it's a bit fuzzy to me why it is so important to run a repair before the GCGraceSeconds kicks in. Does this mean a delete does not get replicated ? In other words when I delete something on a node, doesn't cassandra set tombstones

Node added, no performance boost -- are the tokens correct?

2011-03-31 Thread buddhasystem
I just configured a cluster of two nodes -- do these token values make sense? The reason I'm asking that so far I don't see load balancing to be happening, judging from performance. Address Status State LoadOwnsToken

Re: How to determine if repair need to be run

2011-03-31 Thread Peter Schuller
Thanks a lot for elaborating on repairs.    Still, it's a bit fuzzy to me why it is so important to run a repair before the GCGraceSeconds kicks in.   Does this mean a delete does not get replicated ?   In other words when I delete something on a node, doesn't cassandra set tombstones on

too many open files - maybe a fd leak in indexslicequeries

2011-03-31 Thread Roland Gude
I experience something that looks exactly like https://issues.apache.org/jira/browse/CASSANDRA-1178 On cassandra 0.7.3 when using index slice queries (lots of them) Crashing multiple nodes and rendering the cluster useless. But I have no clue where to look if index queries still leak fd Does

Re: Node added, no performance boost -- are the tokens correct?

2011-03-31 Thread Eric Gilmore
A script that I have says the following: $ python ctokens.py How many nodes are in your cluster? 2 node 0: 0 node 1: 85070591730234615865843651857942052864 The first token should be zero, for the reasons discussed here:

Re: How to determine if repair need to be run

2011-03-31 Thread Eric Gilmore
Peter, I want to join everyone else thanking you for helping out so much with this thread, and especially for pointing out the problems with the DS docs on this topic. We have some corrections posted today, and will keep looking to improve the information. On Thu, Mar 31, 2011 at 3:11 PM, Peter

Re: Revised: Data Modeling advise for Cassandra 0.8 (added #8)

2011-03-31 Thread aaron morton
It does not have a yaml file, so am assuming it's the default Random Partitioner. Aaron On 1 Apr 2011, at 04:51, Drew Kutcharian wrote: Thanks Aaron, I have already checked out Twissandra. I was mainly looking to see how Secondary Indexes can be used and how they effect Data Modeling.

RTG/MRTG/Cricket replacement using Cassandra?

2011-03-31 Thread Aaron Turner
I've been looking at replacing our PostgreSQL backend for RTG (a SNMP based polling and graphing solution for network traffic/ports) with something using Cassandra in order to solve our scalability and redundancy requirements. Based on a lot of what I've read, Cassandra is an ideal data store for

Re: Attempt to assign id to existing column family.

2011-03-31 Thread aaron morton
There is no reason to change the RF on the system keyspace, it should probably not be allowed. The system keyspace uses a LocalPartitioner and it's data is not replicated through the same mechanism as a user keyspace. Aaron On 31 Mar 2011, at 10:22, Jeremy Stribling wrote: On 03/30/2011

Re: RTG/MRTG/Cricket replacement using Cassandra?

2011-03-31 Thread David Hawthorne
I know cloudkick is doing something like this, and we're developing our own in-house method, but it would be nice for there to be a generically-available package that would do this. Lately I've been wishing that someone would take graphite (written in python) and put the frontend on top of

Re: Cassandra error Insufficient space to compact

2011-03-31 Thread aaron morton
Where are the connection refused messages ? Are they client side ? Can you cannot to the cluster with nodetool and run the ring command ? Aaron On 31 Mar 2011, at 11:44, Anurag Gujral wrote: I restarted the cassandra node with more disks when I try to connect to cassandra i get connection

Re: RTG/MRTG/Cricket replacement using Cassandra?

2011-03-31 Thread Ryan King
We have a solution for time series data on cassandra at Twitter that we'd like to open source, but it requires 0.8/trunk so we're not going to release it until that's stable. See http://www.slideshare.net/kevinweil/rainbird-realtime-analytics-at-twitter-strata-2011 -ryan On Thu, Mar 31, 2011

Re: newbie question: how do I know the total number of rows of a cf?

2011-03-31 Thread aaron morton
It iterates over all the SSTables and disk and estimates the number of keys by looking at how big the index is. It does not count the actual keys. aaron On 31 Mar 2011, at 17:46, Sheng Chen wrote: I just found an estmateKeys() method of the ColumnFamilyStoreMBean. Is there any indication

Re: RTG/MRTG/Cricket replacement using Cassandra?

2011-03-31 Thread Paul Choi
Just finished looking at the slides. It looks awesome! On 3/31/11 4:19 PM, Ryan King r...@twitter.com wrote: We have a solution for time series data on cassandra at Twitter that we'd like to open source, but it requires 0.8/trunk so we're not going to release it until that's stable. See

Re: Node added, no performance boost -- are the tokens correct?

2011-03-31 Thread Edward Capriolo
On Thu, Mar 31, 2011 at 6:15 PM, Eric Gilmore e...@datastax.com wrote: A script that I have says the following: $ python ctokens.py How many nodes are in your cluster? 2 node 0: 0 node 1: 85070591730234615865843651857942052864 The first token should be zero, for the reasons discussed here:

Re: RTG/MRTG/Cricket replacement using Cassandra?

2011-03-31 Thread Aaron Turner
On Thu, Mar 31, 2011 at 4:19 PM, Ryan King r...@twitter.com wrote: We have a solution for time series data on cassandra at Twitter that we'd like to open source, but it requires 0.8/trunk so we're not going to release it until that's stable. See

nodetool cfstathistogram error

2011-03-31 Thread mcasandra
Cassandra 7.4: nodetool -h `hostname` cfhistograms system schema Exception in thread main java.lang.reflect.UndeclaredThrowableException at $Proxy5.getRecentReadLatencyHistogramMicros(Unknown Source) at org.apache.cassandra.tools.NodeCmd.printCfHistograms(NodeCmd.java:452)

Re: nodetool cfstathistogram error

2011-03-31 Thread mcasandra
It looks like if I use system schema it fails. Is it because of LocalPartitioner? I ran with other keyspace and got following output. Offset SSTables Write Latency Read Latency Row Size Column Count 1 0 0 0 0 0 2 0 0 0 0 0 179 0 0 0 320 320 Can someone please help me understand the output in

Re: Does anyone build 0.7.4 on IDEA?

2011-03-31 Thread Maki Watanabe
ant on my command line had completed without error. Next I tried to build cassandra 0.7.4 in eclipse, and had luck. So I'll explore cassandra code with eclipse, rather than IDEA. maki 2011/3/31 Maki Watanabe watanabe.m...@gmail.com: Not yet. I'll try. maki 2011/3/31 Tommy Tynjä

Re: Ditching Cassandra

2011-03-31 Thread Edward Capriolo
Gregori, Congrats on writing the fud-liest post of the month award. Firstly if you don't like updates give up on computers and software. Especally give up on anything that has to do with nosql because it is fast evolving. If you think you have a problem with the cassandra api, then what you

A Simple scenario, Help needed

2011-03-31 Thread Prasanna Rajaperumal
Hi All, I am trying out a very simple scenario and I dont seem to get it working. It would be great if I am pointed to some things here. I have set up a 2 node cluster, cassandra.yaml being the default and same for each other than the seed: being each other and I have set the Thrift RPC

Re: nodetool cfstathistogram error

2011-03-31 Thread Edward Capriolo
On Thu, Mar 31, 2011 at 8:25 PM, mcasandra mohitanch...@gmail.com wrote: It looks like if I use system schema it fails. Is it because of LocalPartitioner? I ran with other keyspace and got following output. Offset SSTables Write Latency Read Latency Row Size Column Count 1 0 0 0 0 0 2 0 0

Re: Node added, no performance boost -- are the tokens correct?

2011-03-31 Thread buddhasystem
Yup, I screwed up the token setting, my bad. Now, I moved the tokens. I still observe that read latency deteriorated with 3 machines vs original one. Replication factor is 1, Cassandra version 0.7.2 (didn't have time to upgrade as I need results by this weekend). Key and row caching was disabled

Endless minor compactions after heavy inserts

2011-03-31 Thread Sheng Chen
I've got a single node of cassandra 0.7.4, and I used the java stress tool to insert about 100 million records. The inserts took about 6 hours (45k inserts/sec) but the following minor compactions last for 2 days and the pending compaction jobs are still increasing. From jconsole I can read the

Re: Requests stuck on production cluster

2011-03-31 Thread Jonathan Ellis
What's going on in the logs? CPU? i/o? On Thu, Mar 31, 2011 at 4:20 AM, Or Yanay o...@peer39.com wrote: Hi all, My production cluster reads got stuck. The ring gives: Address Status State LoadOwns Token

Re: too many open files - maybe a fd leak in indexslicequeries

2011-03-31 Thread Jonathan Ellis
Index queries (ColumnFamilyStore.scan) don't do any low-level i/o themselves, they go through CFS.getColumnFamily, which is what normal row fetches also go through. So if there is a leak there it's unlikely to be specific to indexes. What is your open-file limit (remember that sockets count