RE: performance tuning - where does the slowness come from?

2010-05-05 Thread Mark Jones
...@gmail.com] Sent: Wednesday, May 05, 2010 7:04 PM To: user@cassandra.apache.org Subject: Re: performance tuning - where does the slowness come from? On Wed, May 5, 2010 at 6:59 PM, Mark Jones mjo...@imagehawk.commailto:mjo...@imagehawk.com wrote: My data is single row/key to a 500 byte column and I'm

RE: What's the best maximum size for a single column?

2010-04-29 Thread Mark Jones
The max size would probably be best determined by looking at the size of your MemTable !-- ~ Flush memtable after this much data has been inserted, including ~ overwritten data. There is one memtable per column family, and ~ this threshold is based solely on the amount of data

RE: Problem with JVM? concurrent mode failure

2010-04-29 Thread Mark Jones
One of your problems here is the connect uses a daft connection string convention You would think node:port but it's actually node/port Your connection only succeeded because 9160 is the default for port not specified. And the keyspace thing that jbellis mentioned. -Original Message-

RE: Cassandra data model for financial data

2010-04-29 Thread Mark Jones
At the moment they all have to fit in memory during compaction. Columns OR SuperColumns (for one Key). From: Andrew Nguyen [mailto:andrew-lists-cassan...@ucsfcti.org] Sent: Thursday, April 29, 2010 10:30 AM To: user@cassandra.apache.org Subject: Re: Cassandra data model for financial data What

RE: OrderPreservingPartitioner limits and workarounds

2010-04-29 Thread Mark Jones
Sounds like you want something like http://oss.oetiker.ch/rrdtool/ Assuming you are trying to store computer log data. Do you have any other data that can spread the data load? Like a machine name? If so, you can use a hash of that value to place that machine randomly on the net, then

How does cassandra deal with collisions?

2010-04-29 Thread Mark Jones
MD5 is not a perfect hash, it can produce collisions, how are these dealt with? Is there a size appended to them? If 2 keys collide, would that result in a merging of data (if the column names aren't the same) or an overwrite if they were?

RE: Does anybody work about transaction on cassandra ?

2010-04-26 Thread Mark Jones
Orthogonal in this case means at cross purposes Transactions can't really be done with eventual consistency because all nodes don't have all the info at the time the transaction is done. I think they recommend zookeeper for this kind of stuff, but I don't know why you want to use Cassandra vs

org.apache.cassandra.dht.OrderPreservingPartitioner Initial Token

2010-04-23 Thread Mark Jones
How is this specified? Is it a large hex #? A string of bytes in hex? http://wiki.apache.org/cassandra/StorageConfiguration doesn't say.

RE: org.apache.cassandra.dht.OrderPreservingPartitioner Initial Token

2010-04-23 Thread Mark Jones
Ellis [mailto:jbel...@gmail.com] Sent: Friday, April 23, 2010 10:22 AM To: user@cassandra.apache.org Subject: Re: org.apache.cassandra.dht.OrderPreservingPartitioner Initial Token a normal String from the same universe as your keys. On Fri, Apr 23, 2010 at 7:23 AM, Mark Jones mjo...@imagehawk.com

RE: How to insert a row with a TimeUUIDType column in C++

2010-04-23 Thread Mark Jones
Turns out assign can be called with the length as well So mod your code to be new_col.column.assign((char *)uuid, 16); and you are fixed. -Original Message- From: Mark Jones [mailto:mjo...@imagehawk.com] Sent: Friday, April 23, 2010 10:52 AM To: user@cassandra.apache.org Subject: RE

RE: Trove maps

2010-04-23 Thread Mark Jones
Eliminating GC hell would probably do a lot to help Cassandra maintain speed vs periods of superfast/superslow performance. I look forward to hearing how this experiment goes. From: Eric Hauser [mailto:ewhau...@gmail.com] Sent: Friday, April 23, 2010 3:37 PM To: user@cassandra.apache.org

RE: problem with get_key_range in cassandra 0.4.1

2010-04-21 Thread Mark Jones
Stop the program, wipe the data dir and commit logs, start the program, it's what I'm doing. I even made a script that will do it so it's just a one line command. From: ROGER PUIG GANZA [mailto:rp...@tid.es] Sent: Wednesday, April 21, 2010 5:20 AM To: cassandra-u...@incubator.apache.org

At what point does the cluster get faster than the individual nodes?

2010-04-21 Thread Mark Jones
I'm seeing a cluster of 4 (replication factor=2) to be about as slow overall as the barely faster than the slowest node in the group. When I run the 4 nodes individually, I see: For inserts: Two nodes @ 12000/second 1 node @ 9000/second 1 node @ 7000/second For reads: Abysmal, less than

RE: How to increase cassandra's performance in read?

2010-04-20 Thread Mark Jones
I too am seeing very slow performance while testing worst case scenarios of 1 key leading to 1 supercolumn and 1 column beyond that. Key - SuperColumn - 1 Column (of ~ 500 bytes) Drive utilization is 80-90% and I'm only dealing with 50-70 million rows. (With NO swapping) So far, I've found

RE: How to increase cassandra's performance in read?

2010-04-20 Thread Mark Jones
the subcolumns in that supercolumn http://wiki.apache.org/cassandra/CassandraLimitations On Tue, Apr 20, 2010 at 9:50 AM, Mark Jones mjo...@imagehawk.com wrote: I too am seeing very slow performance while testing worst case scenarios of 1 key leading to 1 supercolumn and 1 column beyond

RE: How to increase cassandra's performance in read?

2010-04-20 Thread Mark Jones
at 11:08 AM, Mark Jones mjo...@imagehawk.com wrote: When I first read this, it bothered me because it seemed like it couldn't be so. So I read the link, and it says the whole thing, so I have to ask for some classification here. I had always assumed a super column was similar to a local

RE: How to increase cassandra's performance in read?

2010-04-20 Thread Mark Jones
email per row, and another CF for UserEmails with per-user index rows referencing the Emails rows. b On Tue, Apr 20, 2010 at 9:44 AM, Mark Jones mjo...@imagehawk.com wrote: To make sure I'm clear on what you are saying: Are the Individual Emails in the example below, Supercolumns

RE: 0.6 insert performance .... Re: [RELEASE] 0.6.1

2010-04-19 Thread Mark Jones
I'm seeing some issues like this as well, in fact, I think seeing your graphs has helped me understand the dynamics of my cluster better. Using some ballpark figures for inserting single column objects of ~500 bytes onto individual nodes(not when combined as a cluster): Node1: Inserts 12000/s

Some insight into the slow read speed. Where to go from here? RC1 MESSAGE-DESERIALIZER-POOL

2010-04-08 Thread Mark Jones
I don't see any way to increase the # of active Deserializers in storage-conf.xml Tpstats more than 8 hours after insert/read stop Pool NameActive Pending Completed FILEUTILS-DELETE-POOL 0 0227 STREAM-STAGE 0