We have really obvious optimizations to make there that haven't been done because the biggest contributors so far are using RandomPartitioner...
Are you using get_key_range or get_range_slice for scanning? The former is even slower and deprecated. With get_range_slice your comparator matters, BytesType is fastest. -Jonathan On Wed, Feb 3, 2010 at 6:45 PM, Brian Frank Cooper <coop...@yahoo-inc.com> wrote: > 0.5 does seem to be significantly faster - the latency is better and it > provides significantly more throughput. I'm updating my charts with new > values now. > > One thing that is puzzling is the scan performance. The scan experiment is to > scan between 1-100 records on each request. My 6 node Cassandra cluster is > only getting up to about 230 operations/sec, compared to >1400 ops/sec for > other systems. The latency is quite a bit higher. A chart with these results > is here: > > http://www.brianfrankcooper.net/pubs/scans.png > > Is this the expected performance? I'm using the OrderPreservingPartitioner > with InitialToken values that should evenly partition the data (and the > amount of data in /var/cassandra/data is about the same on all servers). I'm > using get_range_slice() from Java (code snippet below). > > At the max throughput (230 ops/sec), when latency is over 1.2 sec, CPU usage > varies from ~5% to ~72% on different boxes. Disk busy varies from 60% to 90% > (and the machine with the busiest disk is not the one with highest CPU > usage.) Network utilization (eth0 %util both in and out) varies from 15%-40% > on different boxes. So clearly there is some imbalance (and the workload > itself is skewed via a Zipfian distribution) but I'm surprised that the > latencies are so high even in this case. > > Code snippet - fields is a Set<String> listing the columns I want; > recordcount is the number of records to return. > > SlicePredicate predicate; > if (fields==null) > { > predicate = new SlicePredicate(null,new SliceRange(new byte[0], new > byte[0],false,1000000)); > } > else > { > Vector<byte[]> fieldlist=new Vector<byte[]>(); > for (String s : fields) > { > fieldlist.add(s.getBytes("UTF-8")); > } > predicate = new SlicePredicate(fieldlist,null); > } > ColumnParent parent = new ColumnParent("data", null); > > List<KeySlice> results = > client.get_range_slice(table,parent,predicate,startkey,"",recordcount,ConsistencyLevel.ONE); > > Thanks! > > Brian > > ________________________________________ > From: Brian Frank Cooper > Sent: Saturday, January 30, 2010 7:56 AM > To: cassandra-user@incubator.apache.org > Subject: RE: Cassandra versus HBase performance study > > Good idea, we'll benchmark 0.5 next. > > brian > > -----Original Message----- > From: Jonathan Ellis [mailto:jbel...@gmail.com] > Sent: Friday, January 29, 2010 1:13 PM > To: cassandra-user@incubator.apache.org > Subject: Re: Cassandra versus HBase performance study > > Thanks for posting your results; it is an interesting read and we are > pleased to beat HBase in most workloads. :) > > Since you originally benchmarked 0.4.2, you might be interested in the > speed gains in 0.5. A couple graphs here: > http://spyced.blogspot.com/2010/01/cassandra-05.html > > 0.6 (beta in a few weeks?) is looking even better. :) > > -Jonathan >