We are using get_range_slice, and the AsciiType comparator. I'll try to run a 
test with BytesType; but how much difference do you expect?

I would be interested to know the types of optimizations you are planning. We 
are trying to understand how much of the performance results from fundamental 
design decisions versus how much results from the fact that the system is still 
under development.

Thanks!

Brian

-----Original Message-----
From: Jonathan Ellis [mailto:jbel...@gmail.com] 
Sent: Wednesday, February 03, 2010 5:05 PM
To: cassandra-user@incubator.apache.org
Subject: Re: Cassandra versus HBase performance study

We have really obvious optimizations to make there that haven't been
done because the biggest contributors so far are using
RandomPartitioner...

Are you using get_key_range or get_range_slice for scanning?  The
former is even slower and deprecated.

With get_range_slice your comparator matters, BytesType is fastest.

-Jonathan

On Wed, Feb 3, 2010 at 6:45 PM, Brian Frank Cooper
<coop...@yahoo-inc.com> wrote:
> 0.5 does seem to be significantly faster - the latency is better and it 
> provides significantly more throughput. I'm updating my charts with new 
> values now.
>
> One thing that is puzzling is the scan performance. The scan experiment is to 
> scan between 1-100 records on each request. My 6 node Cassandra cluster is 
> only getting up to about 230 operations/sec, compared to >1400 ops/sec for 
> other systems. The latency is quite a bit higher. A chart with these results 
> is here:
>
> http://www.brianfrankcooper.net/pubs/scans.png
>
> Is this the expected performance? I'm using the OrderPreservingPartitioner 
> with InitialToken values that should evenly partition the data (and the 
> amount of data in /var/cassandra/data is about the same on all servers). I'm 
> using get_range_slice() from Java (code snippet below).
>
> At the max throughput (230 ops/sec), when latency is over 1.2 sec, CPU usage 
> varies from ~5% to ~72% on different boxes. Disk busy varies from 60% to 90% 
> (and the machine with the busiest disk is not the one with highest CPU 
> usage.) Network utilization (eth0 %util both in and out) varies from 15%-40% 
> on different boxes. So clearly there is some imbalance (and the workload 
> itself is skewed via a Zipfian distribution) but I'm surprised that the 
> latencies are so high even in this case.
>
> Code snippet - fields is a Set<String> listing the columns I want; 
> recordcount is the number of records to return.
>
> SlicePredicate predicate;
> if (fields==null)
> {
>        predicate = new SlicePredicate(null,new SliceRange(new byte[0], new 
> byte[0],false,1000000));
> }
> else
> {
>        Vector<byte[]> fieldlist=new Vector<byte[]>();
>        for (String s : fields)
>        {
>                fieldlist.add(s.getBytes("UTF-8"));
>        }
>        predicate = new SlicePredicate(fieldlist,null);
> }
> ColumnParent parent = new ColumnParent("data", null);
>
> List<KeySlice> results = 
> client.get_range_slice(table,parent,predicate,startkey,"",recordcount,ConsistencyLevel.ONE);
>
> Thanks!
>
> Brian
>
> ________________________________________
> From: Brian Frank Cooper
> Sent: Saturday, January 30, 2010 7:56 AM
> To: cassandra-user@incubator.apache.org
> Subject: RE: Cassandra versus HBase performance study
>
> Good idea, we'll benchmark 0.5 next.
>
> brian
>
> -----Original Message-----
> From: Jonathan Ellis [mailto:jbel...@gmail.com]
> Sent: Friday, January 29, 2010 1:13 PM
> To: cassandra-user@incubator.apache.org
> Subject: Re: Cassandra versus HBase performance study
>
> Thanks for posting your results; it is an interesting read and we are
> pleased to beat HBase in most workloads. :)
>
> Since you originally benchmarked 0.4.2, you might be interested in the
> speed gains in 0.5.  A couple graphs here:
> http://spyced.blogspot.com/2010/01/cassandra-05.html
>
> 0.6 (beta in a few weeks?) is looking even better. :)
>
> -Jonathan
>

Reply via email to