Re: Cassandra versus HBase performance study

Jonathan Ellis Wed, 03 Feb 2010 17:06:03 -0800

We have really obvious optimizations to make there that haven't been
done because the biggest contributors so far are using
RandomPartitioner...


Are you using get_key_range or get_range_slice for scanning?  The
former is even slower and deprecated.

With get_range_slice your comparator matters, BytesType is fastest.

-Jonathan

On Wed, Feb 3, 2010 at 6:45 PM, Brian Frank Cooper
<coop...@yahoo-inc.com> wrote:
> 0.5 does seem to be significantly faster - the latency is better and it 
> provides significantly more throughput. I'm updating my charts with new 
> values now.
>
> One thing that is puzzling is the scan performance. The scan experiment is to 
> scan between 1-100 records on each request. My 6 node Cassandra cluster is 
> only getting up to about 230 operations/sec, compared to >1400 ops/sec for 
> other systems. The latency is quite a bit higher. A chart with these results 
> is here:
>
> http://www.brianfrankcooper.net/pubs/scans.png
>
> Is this the expected performance? I'm using the OrderPreservingPartitioner 
> with InitialToken values that should evenly partition the data (and the 
> amount of data in /var/cassandra/data is about the same on all servers). I'm 
> using get_range_slice() from Java (code snippet below).
>
> At the max throughput (230 ops/sec), when latency is over 1.2 sec, CPU usage 
> varies from ~5% to ~72% on different boxes. Disk busy varies from 60% to 90% 
> (and the machine with the busiest disk is not the one with highest CPU 
> usage.) Network utilization (eth0 %util both in and out) varies from 15%-40% 
> on different boxes. So clearly there is some imbalance (and the workload 
> itself is skewed via a Zipfian distribution) but I'm surprised that the 
> latencies are so high even in this case.
>
> Code snippet - fields is a Set<String> listing the columns I want; 
> recordcount is the number of records to return.
>
> SlicePredicate predicate;
> if (fields==null)
> {
>        predicate = new SlicePredicate(null,new SliceRange(new byte[0], new 
> byte[0],false,1000000));
> }
> else
> {
>        Vector<byte[]> fieldlist=new Vector<byte[]>();
>        for (String s : fields)
>        {
>                fieldlist.add(s.getBytes("UTF-8"));
>        }
>        predicate = new SlicePredicate(fieldlist,null);
> }
> ColumnParent parent = new ColumnParent("data", null);
>
> List<KeySlice> results = 
> client.get_range_slice(table,parent,predicate,startkey,"",recordcount,ConsistencyLevel.ONE);
>
> Thanks!
>
> Brian
>
> ________________________________________
> From: Brian Frank Cooper
> Sent: Saturday, January 30, 2010 7:56 AM
> To: cassandra-user@incubator.apache.org
> Subject: RE: Cassandra versus HBase performance study
>
> Good idea, we'll benchmark 0.5 next.
>
> brian
>
> -----Original Message-----
> From: Jonathan Ellis [mailto:jbel...@gmail.com]
> Sent: Friday, January 29, 2010 1:13 PM
> To: cassandra-user@incubator.apache.org
> Subject: Re: Cassandra versus HBase performance study
>
> Thanks for posting your results; it is an interesting read and we are
> pleased to beat HBase in most workloads. :)
>
> Since you originally benchmarked 0.4.2, you might be interested in the
> speed gains in 0.5.  A couple graphs here:
> http://spyced.blogspot.com/2010/01/cassandra-05.html
>
> 0.6 (beta in a few weeks?) is looking even better. :)
>
> -Jonathan
>

Re: Cassandra versus HBase performance study

Reply via email to