RE: about the data directory

2011-01-13 Thread raoyixuan (Shandy)
Not exactly. You mean one data will be put in four nodes which have 25%? If does, how about two replica? From: Viktor Jevdokimov [mailto:viktor.jevdoki...@adform.com] Sent: Thursday, January 13, 2011 2:59 PM To: user@cassandra.apache.org Subject: RE: about the data directory I have 4 nodes,

Re: about the data directory

2011-01-13 Thread Peter Schuller
I have 4 nodes, then I  I create one keyspace (such as FOO) with replica factor =1 and insert an data, why I can see the directory of /var/lib/Cassandra/data/FOO in every nodes? As I know, I just have one replica The schema (keyspaces and column families) are global across the cluster

RE: about the data directory

2011-01-13 Thread raoyixuan (Shandy)
I agree with you totally. but I want to know which node is the data kept? I mean which way to know the actual data kept? -Original Message- From: sc...@scode.org [mailto:sc...@scode.org] On Behalf Of Peter Schuller Sent: Thursday, January 13, 2011 4:20 PM To: user@cassandra.apache.org

Re: about the data directory

2011-01-13 Thread Peter Schuller
I agree with you totally. but I want to know which node is the data kept? I mean which way to know the actual data kept? If you're just doing testing, you might 'nodetool flush' each host and then look for the sstable being written. Prior to a flush, it's going to sit in a memtable in memory

RE: about the data directory

2011-01-13 Thread raoyixuan (Shandy)
So you mean just the replica node 's sstable will be changed ,right? If all the replica node broke down, whether the users can read the data? -Original Message- From: sc...@scode.org [mailto:sc...@scode.org] On Behalf Of Peter Schuller Sent: Thursday, January 13, 2011 4:32 PM To:

Re: Usage Pattern : quot;uniquequot; value of a key.

2011-01-13 Thread David Boxenhorn
It is unlikely that both racing threads will have exactly the same microsecond timestamp at the moment of creating a new user - so if data you read have exactly the same timestamp you used to write data - this is your data. I think this would have to be combined with CL=QUORUM for both write and

Question about fat rows

2011-01-13 Thread Héctor Izquierdo Seliva
Hi everyone. I have a question about data modeling in my application. I have to store items of a customer, and I can do it in one fat row per customer where the column name is the id and the value a json serialized object, or one entry per item with the same layout. This data is updated almost

Re: Requesting data model suggestions

2011-01-13 Thread Thomas Boose
Hello Scott, 6 month later but maybe you are still interested. I wrote an article on the subject of migrating EERD models to Cassandra on the Cassandra wiki. Have a look and tell me what you think of it: http://wiki.apache.org/cassandra/ThomasBoose/EERD%20model%20components%20to%20Ca

Re: Usage Pattern : quot;uniquequot; value of a key.

2011-01-13 Thread Benoit Perroud
Thanks for your answer. You're right when you say it's unlikely that 2 threads have the same timestamp, but it can. So it could work for user creation, but maybe not on a more write intensive problem. Moreover, we cannot rely on fully time synchronized node in the cluster (but on node

Re: Timeout Errors while running Hadoop over Cassandra

2011-01-13 Thread Jeremy Hanna
On Jan 12, 2011, at 12:40 PM, Jairam Chandar wrote: Hi folks, We have a Cassandra 0.6.6 cluster running in production. We want to run Hadoop (version 0.20.2) jobs over this cluster in order to generate reports. I modified the word_count example in the contrib folder of the cassandra

Re: about the data directory

2011-01-13 Thread Peter Schuller
So you mean just the replica node 's sstable will be changed ,right? The data will only be written to the nodes that are part of the replica set fo the row (with the exception of hinted handoff, but that's a different sstable). If all the replica node broke down, whether the users can read the

Re: java.net.BindException: Cannot assign requested address

2011-01-13 Thread vikram prajapati
Gary Dusbabek gdusbabek at gmail.com writes: On Tue, Nov 3, 2009 at 15:44, mobiledreamers at gmail.com wrote: ERROR - Exception encountered during startup. java.net.BindException: Cannot assign requested address     at sun.nio.ch.Net.bind(Native Method)     at

Welcome committer Jake Luciani

2011-01-13 Thread Jonathan Ellis
The Cassandra PMC has voted to add Jake as a committer. (Jake is also a committer on Thrift.) Welcome, Jake, and thanks for the hard work! -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com

Re: Welcome committer Jake Luciani

2011-01-13 Thread Jake Luciani
Thanks Jonathan and Cassandra PMC! Happy to help Cassandra take over the world! -Jake On Thu, Jan 13, 2011 at 1:41 PM, Jonathan Ellis jbel...@gmail.com wrote: The Cassandra PMC has voted to add Jake as a committer. (Jake is also a committer on Thrift.) Welcome, Jake, and thanks for the

cassandra row cache

2011-01-13 Thread Saket Joshi
Hi, I am running a 15 node cluster ,version 0.6.8, Linux 64bit OS, using mmap I/O, 6GB ram allocated. I have row cache enabled to 8 keys (mean row size is 2KB). I am observing a strange behaviour.. I query for 1.6 Million rows across the cluster and time taken is around 40 mins , I query the

Re: cassandra row cache

2011-01-13 Thread Jonathan Ellis
does the cache size change between 2nd and 3rd time? On Thu, Jan 13, 2011 at 10:47 AM, Saket Joshi sjo...@touchcommerce.com wrote: Hi, I am running a 15 node cluster ,version 0.6.8, Linux 64bit OS, using mmap I/O, 6GB ram allocated. I have row cache enabled to 8 keys (mean row size is

Re: Welcome committer Jake Luciani

2011-01-13 Thread Edward Capriolo
Three cheers! On Thu, Jan 13, 2011 at 1:45 PM, Jake Luciani jak...@gmail.com wrote: Thanks Jonathan and Cassandra PMC! Happy to help Cassandra take over the world! -Jake On Thu, Jan 13, 2011 at 1:41 PM, Jonathan Ellis jbel...@gmail.com wrote: The Cassandra PMC has voted to add Jake as a

RE: cassandra row cache

2011-01-13 Thread Saket Joshi
Yes it does change. -Original Message- From: Jonathan Ellis [mailto:jbel...@gmail.com] Sent: Thursday, January 13, 2011 11:01 AM To: user Subject: Re: cassandra row cache does the cache size change between 2nd and 3rd time? On Thu, Jan 13, 2011 at 10:47 AM, Saket Joshi

Newbie Replication/Cluster Question

2011-01-13 Thread Mark Moseley
I'm just starting to play with Cassandra, so this is almost certainly a conceptual problem on my part, so apologies in advance. I was testing out how I'd do things like bring up new nodes. I've got a simple 2-node cluster with my only keyspace having replication_factor=2. This is on 32-bit Debian

Re: Are you using Phpcassa for any application currently in production? or considering so ?

2011-01-13 Thread Frank LoVecchio
Slow. Look at the Play Framework, http://www.playframework.org/, with a Java client. On Thu, Jan 13, 2011 at 12:17 PM, Ertio Lew ertio...@gmail.com wrote: I need to choose one amongst several client options to work with Cassandra for a serious web application for production environments. I

Re: cassandra row cache

2011-01-13 Thread Ryan King
I'm not sure if this is entirely true, but I *think* older version of cassandra used a version of the ConcurrentLinkedHashmap (which backs the row cache) that used the Second Chance algorithm, rather than LRU, which might explain this non-LRU-like behavior. I may be entirely wrong about this

Bloom filter

2011-01-13 Thread Carlos Sanchez
All, Could someone tell me where (what classes) or what library is Cassandra using for its bloom filters? Thanks Carlos This email message and any attachments are for the sole use of the intended recipients and may contain proprietary and/or confidential information which may be privileged

RE: cassandra row cache

2011-01-13 Thread Saket Joshi
The cache is 800,000 per node , I have 15 nodes in the cluster. I see the cache value increased after the first run, the row cache hit rate was 0 for first run. For second run of the same data , the hit rate increased to 30% but on the third it jumps to 99% -Saket -Original Message-

Re: Newbie Replication/Cluster Question

2011-01-13 Thread Gary Dusbabek
It is impossible to properly bootstrap a new node into a system where there are not enough nodes to satisfy the replication factor. The cluster as it stands doesn't contain all the data you are asking it to replicate on the new node. Gary. On Thu, Jan 13, 2011 at 13:13, Mark Moseley

Re: Bloom filter

2011-01-13 Thread Chris Burroughs
On 01/13/2011 04:07 PM, Carlos Sanchez wrote: Could someone tell me where (what classes) or what library is Cassandra using for its bloom filters? src/java/org/apache/cassandra/utils/BloomFilter.java

Re: Are you using Phpcassa for any application currently in production? or considering so ?

2011-01-13 Thread ian douglas
We use SimpleCassie in production right now. http://code.google.com/p/simpletools-php/wiki/SimpleCassie On 01/13/2011 11:17 AM, Ertio Lew wrote: I need to choose one amongst several client options to work with Cassandra for a serious web application for production environments. I prefer to

Re: cassandra row cache

2011-01-13 Thread Edward Capriolo
Is it possible that your are reading at READ.ONE and that READ.ONE only warms cache on 1 of your three nodes= 20. 2nd read warms another 60%, and by the third read all the replicas are warm? 99% ? This would be true if digest reads were not warming caches. Edward On Thu, Jan 13, 2011 at 4:07

Re: Newbie Replication/Cluster Question

2011-01-13 Thread Mark Moseley
On Thu, Jan 13, 2011 at 1:08 PM, Gary Dusbabek gdusba...@gmail.com wrote: It is impossible to properly bootstrap a new node into a system where there are not enough nodes to satisfy the replication factor.  The cluster as it stands doesn't contain all the data you are asking it to replicate on

python client example

2011-01-13 Thread felix gao
Guys, I just installed python-cassandra 0.6.1 and Thrift 0.5.0 on my machine and I would like to query against also write into a cassandra server. I guess i am pretty weak in google-fu, there isn't any examples for me get started with. Please help me on how to do this. Thanks, Felix

Re: java.net.BindException: Cannot assign requested address

2011-01-13 Thread Aaron Morton
Can you post the settings you have for- listen_address- storage_port- rpc_address- rpc_portAlso the full error stack again, your original email has dropped off. can you use cassanra 0.7 ?AaronOn 14 Jan, 2011,at 05:22 AM, vikram prajapati prajapativik...@hotmail.com wrote: ERROR 11:33:56,246

Storing big objects into columns

2011-01-13 Thread Victor Kabdebon
Dear all, In a project I would like to store big objects in columns, serialized. For example entire images (several Ko to several Mo), flash animations (several Mo) etc... Does someone use Cassandra with those relatively big columns and if yes does it work well ? Is there any drawbacks using this

Re: Storing big objects into columns

2011-01-13 Thread Ryan King
On Thu, Jan 13, 2011 at 2:38 PM, Victor Kabdebon victor.kabde...@gmail.com wrote: Dear all, In a project I would like to store big objects in columns, serialized. For example entire images (several Ko to several Mo), flash animations (several Mo) etc... Does someone use Cassandra with those

Re: python client example

2011-01-13 Thread Aaron Morton
Pycassahttps://github.com/pycassa/pycassaHas documentation herehttp://pycassa.github.com/pycassa/Where does python-cassandra live ?AaronOn 14 Jan, 2011,at 11:34 AM, felix gao gre1...@gmail.com wrote:Guys,I justinstalledpython-cassandra 0.6.1 andThrift 0.5.0 on my machine and I would like to query

Re: python client example

2011-01-13 Thread felix gao
this is where it is stored /opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/ On Thu, Jan 13, 2011 at 2:39 PM, Aaron Morton aa...@thelastpickle.comwrote: Pycassa https://github.com/pycassa/pycassa Has documentation here http://pycassa.github.com/pycassa/

Re: Storing big objects into columns

2011-01-13 Thread Victor Kabdebon
Is there any recommanded maximum size for a Column ? (not the very upper limit which is 2Gb) Why is it useful to chunk the content into multiple columns ? Thank you, Victor K. 2011/1/13 Ryan King r...@twitter.com On Thu, Jan 13, 2011 at 2:38 PM, Victor Kabdebon victor.kabde...@gmail.com

Re: python client example

2011-01-13 Thread Aaron Morton
Sorry, I meant where did you get python-cassandra from on the web.Can you use Pycassa, even just as a learning experience ? There is a tutorial herehttp://pycassa.github.com/pycassa/tutorial.htmlAOn 14 Jan, 2011,at 11:42 AM, felix gao gre1...@gmail.com wrote:this is where it is

Re: python client example

2011-01-13 Thread Tyler Hobbs
Right, python-cassandra just provides the raw Thrift API, which is no fun at all. You should start out with pycassa. - Tyler On Thu, Jan 13, 2011 at 4:45 PM, Aaron Morton aa...@thelastpickle.comwrote: Sorry, I meant where did you get python-cassandra from on the web. Can you use Pycassa,

Re: Storing big objects into columns

2011-01-13 Thread Ryan King
On Thu, Jan 13, 2011 at 2:44 PM, Victor Kabdebon victor.kabde...@gmail.com wrote: Is there any recommanded maximum size for a Column ? (not the very upper limit which is 2Gb) Why is it useful to chunk the content into multiple columns ? I think you're going to have to do some tests yourself.

Re: Storing big objects into columns

2011-01-13 Thread Victor Kabdebon
Ok thank you very much for these information ! If somebody has more insights on this matter I am still interested ! Victor K. 2011/1/13 Ryan King r...@twitter.com On Thu, Jan 13, 2011 at 2:44 PM, Victor Kabdebon victor.kabde...@gmail.com wrote: Is there any recommanded maximum size for a

Re: python client example

2011-01-13 Thread Aaron Morton
Ah, i get it now. The python code generated from running ant gen-thrift-py .IMHO Start with Pycassa *even* if you want to go your own way later. It solves a lot of problems for you and will save you time.AOn 14 Jan, 2011,at 11:46 AM, Tyler Hobbs ty...@riptano.com wrote:Right, python-cassandra just

Re: python client example

2011-01-13 Thread felix gao
Thanks guys, playing around with pycassa right now. seems pretty good. On Thu, Jan 13, 2011 at 2:56 PM, Aaron Morton aa...@thelastpickle.comwrote: Ah, i get it now. The python code generated from running ant gen-thrift-py . IMHO Start with Pycassa *even* if you want to go your own way later.

Re: cassandra row cache

2011-01-13 Thread Jonathan Ellis
On Thu, Jan 13, 2011 at 2:00 PM, Edward Capriolo edlinuxg...@gmail.com wrote: Is it possible that your are reading at READ.ONE and that READ.ONE only warms cache on 1 of your three nodes= 20. 2nd read warms another 60%, and by the third read all the replicas are warm? 99% ? This would be true

RE: java.net.BindException: Cannot assign requested address

2011-01-13 Thread raoyixuan (Shandy)
It's ip address problem, whether you ip address had changed? Please confirm it and restart the Cassandra. -Original Message- From: vikram prajapati [mailto:prajapativik...@hotmail.com] Sent: Friday, January 14, 2011 12:23 AM To: user@cassandra.apache.org Subject: Re:

Cassandra freezes under load when using libc6 2.11.1-0ubuntu7.5

2011-01-13 Thread Mike Malone
Hey folks, We've discovered an issue on Ubuntu/Lenny with libc6 2.11.1-0ubuntu7.5 (it may also affect versions between 2.11.1-0ubuntu7.1 and 2.11.1-0ubuntu7.4). The bug affects systems when a large number of threads (or processes) are created rapidly. Once triggered, the system will become

Re: Old data not indexed

2011-01-13 Thread Tan Yeh Zheng
Hi all, More specifically, I added two rows of data. Row A (users['A']['state']='UT') is added before I add indexing to the column and the Row B (users['B']['state']='UT') after indexing. When I call get_indexed_slices (state='UT') to query the two rows, only the Row B is returned. It's as if

Re: about the data directory

2011-01-13 Thread Edward Capriolo
On Thu, Jan 13, 2011 at 7:56 PM, raoyixuan (Shandy) raoyix...@huawei.com wrote: I have some confused, why do the users can read the data in all nodes? I mean the data just be kept in the replica, how to achieve it? -Original Message- From: sc...@scode.org [mailto:sc...@scode.org] On

RE: about the data directory

2011-01-13 Thread raoyixuan (Shandy)
as a administrator, I want to know why I can read the data from any node, because the data just be kept the replica. Can you tell me? Thanks in advance. -Original Message- From: Edward Capriolo [mailto:edlinuxg...@gmail.com] Sent: Friday, January 14, 2011 9:44 AM To:

Is there any way I could use keys of other rows as column names that could be sorted according to time ?

2011-01-13 Thread Aklin_81
I would like to keep the reference of other rows as names of super column and sort those super columns according to time. Is there any way I could implement that ? Thanks in advance!

Re: Is there any way I could use keys of other rows as column names that could be sorted according to time ?

2011-01-13 Thread Aaron Morton
You could make the time an a fixed width integer and prefix your row keys with it, then set the comparotor to ascii or utf.Some issues:- Will you have time collisions ?- Not sure what your are storing in the super columns, but their are

limiting columns in a row

2011-01-13 Thread mike dooley
hi, the time-to-live feature in 0.7 is very nice and it made me want to ask about a somewhat similar feature. i have a stream of data consisting of entities and associated samples. so i create a row for each entity and the columns in each row contain the samples for that entity. when i

Re: Cassandra freezes under load when using libc6 2.11.1-0ubuntu7.5

2011-01-13 Thread Erik Onnen
May or may not be related but I thought I'd recount a similar experience we had in EC2 in hopes it helps someone else. As background, we had been running several servers in a 0.6.8 ring with no Cassandra issues (some EC2 issues, but none related to Cassandra) on multiple EC2 XL instances in a