How to increase cassandra's performance in read?

2010-04-20 Thread yangfeng
I get 10 columns Family by keys and one columns Family has 30 columns. I use multigetSlice once to get 10 column Family.but the performance is so poor. anyone has other thought to increase the performance.

RE: Cassandra Java Client

2010-04-20 Thread Dop Sun
Hi, I have downloaded hector-0.6.0-10.jar. As you mentioned, it has good implementation for the connection pooling, JMX counters. What I’m doing is: using Hector to create the Cassandra client (be specific: borrow_client(url, port)). And my understanding is: in this way, the Jassandra

RE: How to increase cassandra's performance in read?

2010-04-20 Thread Mark Jones
I too am seeing very slow performance while testing worst case scenarios of 1 key leading to 1 supercolumn and 1 column beyond that. Key - SuperColumn - 1 Column (of ~ 500 bytes) Drive utilization is 80-90% and I'm only dealing with 50-70 million rows. (With NO swapping) So far, I've found

Tool for managing cluster nodes?

2010-04-20 Thread Joost Ouwerkerk
What are people using to manage Cassandra cluster nodes? i.e. to start, stop, copy config files, etc. I'm using cssh and wondering if there is a better way... Joost.

Re: Tool for managing cluster nodes?

2010-04-20 Thread Roger Schildmeijer
dancer's shell / distributed shell http://www.netfort.gr.jp/~dancer/software/dsh.html.en On 20 apr 2010, at 17.18em, Joost Ouwerkerk wrote: What are people using to manage Cassandra cluster nodes? i.e. to start, stop, copy config files, etc. I'm using cssh and wondering if there is a

Re: How to increase cassandra's performance in read?

2010-04-20 Thread Jonathan Ellis
How many columns are in the supercolumn total? in super columnfamilies there is a third level of subcolumns; these are not indexed, and any request for a subcolumn deserializes _all_ the subcolumns in that supercolumn http://wiki.apache.org/cassandra/CassandraLimitations On Tue, Apr 20, 2010 at

RE: How to increase cassandra's performance in read?

2010-04-20 Thread Mark Jones
Sorry, I didn't answer your question in my response, I have at this point: Key(ID) When/Where SuperColumn Tag: and Columns {Data: One Value (not yet written, tags, flags)} Under some keys (very small #) there will be 2 values like: Key(ID) When/Where SuperColumn Tag: and Columns

Re: How to increase cassandra's performance in read?

2010-04-20 Thread Jonathan Ellis
Not all the data associated w/ the key is brought into memory, just all the data associated w/ the supercolumns being queried. Supercolumns are so you can update a smallish number of subcolumns independently (e.g. when denormalizing an entire narrow row, usually with a finite set of columns). If

RE: How to increase cassandra's performance in read?

2010-04-20 Thread Mark Jones
To make sure I'm clear on what you are saying: Are the Individual Emails in the example below, Supercolumns and the {body, header, tags...} the subcolumns? Is that a sane data layout for an email system? Where the Supercolumn identifier is the conversation label Sorry to be so daft, but

Re: Modelling assets and user permissions

2010-04-20 Thread tsuraan
Suppose I have a CF that holds some sort of assets that some users of my program have access to, and that some do not.  In SQL-ish terms it would look something like this: TABLE Assets (  asset_id serial primary key,  ... ); TABLE Users (  user_id serial primary key,  user_name text

Filters

2010-04-20 Thread Christian Torres
Hello! Is there any way to make filters (WHEREs) in cassandra? Or I have to manages to do it For example: I have a ColumnFamily with a column in each row whose value is a state... Public or Private, so I want to filter all rows that are private and also the public ones in other form... Beside

Re: 0.6.1 insert 1B rows, crashed when using py_stress

2010-04-20 Thread Tatu Saloranta
On Mon, Apr 19, 2010 at 7:12 PM, Brandon Williams dri...@gmail.com wrote: On Mon, Apr 19, 2010 at 9:06 PM, Schubert Zhang zson...@gmail.com wrote: 2. Reject the request when be short of resource, instead of throws OOME and exit (crash). Right, that is the crux of the problem  It will be

Re: Re: Modelling assets and user permissions

2010-04-20 Thread charleswoerner
The short answer as to what people normally do is that they use a relational database for something like this. I'm curious as to how you would have so many asset / user permissions that you couldn't use a standard relational database to model them. Is this some sort of multi-tenant system

Re: Tool for managing cluster nodes?

2010-04-20 Thread B. Todd Burruss
http://sourceforge.net/projects/clusterssh/ Roger Schildmeijer wrote: dancer's shell / distributed shell http://www.netfort.gr.jp/~dancer/software/dsh.html.en On 20 apr 2010, at 17.18em, Joost Ouwerkerk wrote: What are people using to manage Cassandra cluster nodes? i.e. to

Re: Filters

2010-04-20 Thread Christian Torres
Mmmm... According with this doc http://wiki.apache.org/cassandra/API#get_slice that a developer mailed to me It's possible!! I sent you as reference On Tue, Apr 20, 2010 at 11:17 AM, Mark Jones mjo...@imagehawk.com wrote: You will have to pull the columns and filter yourself. *From:*

Re: Cassandra Java Client

2010-04-20 Thread Nathan McCall
Dop, Thank you for trying out hector. I think you have the right approach for using it with your project. Feel free to ping us directly regarding Hector on either of these mailings lists as appropriate: http://wiki.github.com/rantav/hector/mailing-lists Cheers, -Nate On Tue, Apr 20, 2010 at 7:11

Re: Cassandra Java Client

2010-04-20 Thread Ran Tavory
great, I'm happy you found hector useful and reused it in your client. On Tue, Apr 20, 2010 at 5:11 PM, Dop Sun su...@dopsun.com wrote: Hi, I have downloaded hector-0.6.0-10.jar. As you mentioned, it has good implementation for the connection pooling, JMX counters. What I’m doing is:

Re: Filters

2010-04-20 Thread Miguel Verde
http://wiki.apache.org/cassandra/API#get_slice get_slice retrieves the values for either (a) a list of column names or (b) a range of columns, depending on the SlicePredicate you use. It does not allow you to filter a la SQL's WHERE. You would need to create your own index to do so, at least

Re: Re: Modelling assets and user permissions

2010-04-20 Thread tsuraan
I'm curious as to how you would have so many asset / user permissions that you couldn't use a standard relational database to model them. Is this some sort of multi-tenant system where you're providing some generalized asset check-out mechanism to many, many customers? Even so, I'm not sure

Delete row

2010-04-20 Thread Sonny Heer
How do i delete a row using BMT method? Do I simply do a mutate with column delete flag set to true? Thanks.

Re: cleaning house

2010-04-20 Thread Benjamin Black
Are you deleting data through the API or just doing a bunch of inserts and then running a compaction? The latter will not result in anything to clean up since data must be explicitly deleted. b On Tue, Apr 20, 2010 at 10:33 AM, B. Todd Burruss bburr...@real.com wrote: i'm trying to draw some

Re: cleaning house

2010-04-20 Thread Jonathan Ellis
Added to http://wiki.apache.org/cassandra/MemtableSSTable: SSTables that are obsoleted by a compaction are deleted asynchronously when the JVM performs a GC. You can force a GC from jconsole if necessary but this is not necessary; Cassandra will force one itself if it detects that it is low on

Re: cleaning house

2010-04-20 Thread B. Todd Burruss
i have done no deletes, just inserts. so you are correct, there isn't any data to cleanup. however when i run some of the cleanup and/or compaction tasks the space used on disk actually grows, and i would like to force any unneeded files to be removed. as i write this, jonathan has

Re: How to increase cassandra's performance in read?

2010-04-20 Thread Benjamin Black
I can't answer for its sanity, but I would not do it that way. I'd have a CF for Emails, with 1 email per row, and another CF for UserEmails with per-user index rows referencing the Emails rows. b On Tue, Apr 20, 2010 at 9:44 AM, Mark Jones mjo...@imagehawk.com wrote: To make sure I'm clear

Re: get_range_slices in hector

2010-04-20 Thread Ran Tavory
We haven't gotten around to implementing this yet and so far no one needed that badly enough to write it. We accept contributions or forks and we use github, so feel free to diy (forks are preferable). http://github.com/rantav/hector On Tue, Apr 20, 2010 at 3:25 AM, Chris Dean ctd...@sokitomi.com

RE: How to increase cassandra's performance in read?

2010-04-20 Thread Mark Jones
When I look at this arrangement, I see one lookup by key for the user, followed by a large read for all the email indexes (these are all columns in the same row, right?) Then one lookup by key for each email Seems very seek intensive. Would a better way be to index each email with a key

Re: Filters

2010-04-20 Thread Christian Torres
So the sugestion would be create a column family with the values or states and with columns save the matches? On Tue, Apr 20, 2010 at 11:27 AM, Roger Schildmeijer schildmei...@gmail.com wrote: My bad. Missed your one-to-one relationship (row key - column ) On 20 apr 2010, at 19.24em,

Re: Filters

2010-04-20 Thread Christian Torres
And the key would be the state or value matched, I'm getting it well? On Tue, Apr 20, 2010 at 2:46 PM, Christian Torres chtor...@gmail.comwrote: So the sugestion would be create a column family with the values or states and with columns save the matches? On Tue, Apr 20, 2010 at 11:27 AM,

Re: Re: Modelling assets and user permissions

2010-04-20 Thread tsuraan
It seems to me you might get by with putting the actual assets into cassandra (possibly breaking them up into chunks depending on how big they are) and storing the pointers to them in Postgres along with all the other metadata.  If it were me, I'd split each file into a fixed chunksize and

Using get_range_slices

2010-04-20 Thread Chris Dean
I'd like to use get_range_slices to pull all the keys from a small CF with 10,000 keys. I'd also like to get them in chunks of 100 at a time. Is there a way to do that? I thought I could set start_token and end_token in KeyRange, but I can't figure out what the intial start_token should be.

Big Data Workshop 4/23 was Re: Cassandra Hackathon in SF @ Digg - 04/22 6:30pm

2010-04-20 Thread Joseph Boyle
Reminder - price goes up after tonight at http://bigdataworkshop.eventbrite.com We now have enough people interested in a bus or van from SF to Mountain View to offer one. Check the interested box when you register and we will send you pickup point information. We will have people from the

Re: How to increase cassandra's performance in read?

2010-04-20 Thread Benjamin Black
On Tue, Apr 20, 2010 at 11:54 AM, Mark Jones mjo...@imagehawk.com wrote: When I look at this arrangement, I see one lookup by key for the user, followed by a large read for all the email indexes  (these are all columns in the same row, right?) Then one lookup by key for each email  

Re: TimeoutException when I put very large value

2010-04-20 Thread Ryan King
what's your RPC timeout in storage-conf? -ryan On Tue, Apr 20, 2010 at 6:46 PM, Jeff Zhang zjf...@gmail.com wrote: Hi all, When I insert very large value, the thrift will throw TimeOutException, event If I set the socket timeout as 10 minutes.  I believe the 10 minutes is enough for

Re: TimeoutException when I put very large value

2010-04-20 Thread acrd seek
Thanks Ryan, I also notice this prameter in storage-conf just now. I am going to increase this number to test whether it will work 2010/4/21 Ryan King r...@twitter.com what's your RPC timeout in storage-conf? -ryan On Tue, Apr 20, 2010 at 6:46 PM, Jeff Zhang zjf...@gmail.com wrote: Hi

Batch row deletion

2010-04-20 Thread Carlos Sanchez
All, Is there or will there be a feature to batch delete rows? (KeyRange delete?) Thanks Carlos This email message and any attachments are for the sole use of the intended recipients and may contain proprietary and/or confidential information which may be privileged or otherwise protected

RE: Batch row deletion

2010-04-20 Thread Carlos Sanchez
Awesome thx.. Carlos From: Jonathan Ellis [jbel...@gmail.com] Sent: Tuesday, April 20, 2010 10:52 PM To: user@cassandra.apache.org Subject: Re: Batch row deletion This will be done in https://issues.apache.org/jira/browse/CASSANDRA-293 On Tue, Apr 20,