Re: Cassandra versus HBase performance study

2010-02-21 Thread Jonathan Ellis
On Wed, Feb 3, 2010 at 7:45 PM, Brian Frank Cooper
coop...@yahoo-inc.com wrote:
 One thing that is puzzling is the scan performance. The scan experiment is to 
 scan between 1-100 records on each request. My 6 node Cassandra cluster is 
 only getting up to about 230 operations/sec, compared to 1400 ops/sec for 
 other systems. The latency is quite a bit higher. A chart with these results 
 is here:

 http://www.brianfrankcooper.net/pubs/scans.png

 Is this the expected performance? I'm using the OrderPreservingPartitioner 
 with InitialToken values that should evenly partition the data (and the 
 amount of data in /var/cassandra/data is about the same on all servers). I'm 
 using get_range_slice() from Java (code snippet below).

This got some attention for 0.6, since we have added Hadoop support in
that release.  (0.6 is branched now, Beta / RC coming soon.)  Turns
out the (or more likely:: a :) main bottleneck was, our memtables
were not kept ordered by key, so it had to sort them for each range
query.  Switching from NonBlockingHashMap to ConcurrentSkiplistMap
made things much faster.  (CASSANDRA-799)

We're planning on optimizing this more for 0.7, and we've added range
queries to our stress test tool (CASSANDRA-765) for that.

-Jonathan


Re: Cassandra versus HBase performance study

2010-02-05 Thread Jonathan Ellis
On Fri, Feb 5, 2010 at 4:51 PM, Brian Frank Cooper
coop...@yahoo-inc.com wrote:
 Yes, I had used the default 0.1.

 These boxes have 8 GB of RAM and I was giving 6 GB to the JVM (-Xmx). Does 
 Cassandra do a read caching of data? It seems from the text in storage.conf 
 that keys cache fraction refers only to indexing the keys, not caching the 
 content. So I would imagine increasing the keys cached fraction would 
 decrease the memory used for data caching

0.5 doesn't do data caching (except what you get for free from the
OS).  0.6 will change this, but for 0.5 things are nice and simple. :)
 So a large heap can actually make things worse if it makes the GC
lazy and the OS can't use that to cache data.

If your data set is larger than 5GB on disk I would say give it
KeysCachedFraction of 0.2, 3GB JVM heap.  Otherwise 0.4 and 5GB.

-Jonathan


Re: Cassandra versus HBase performance study

2010-02-04 Thread Ian Holsman
Hi Brian.
was there any performance changes on the other tests with v0.5 ?
the graphs on the other pages looks remarkably identical.

On Feb 4, 2010, at 11:45 AM, Brian Frank Cooper wrote:

 0.5 does seem to be significantly faster - the latency is better and it 
 provides significantly more throughput. I'm updating my charts with new 
 values now.
 
 One thing that is puzzling is the scan performance. The scan experiment is to 
 scan between 1-100 records on each request. My 6 node Cassandra cluster is 
 only getting up to about 230 operations/sec, compared to 1400 ops/sec for 
 other systems. The latency is quite a bit higher. A chart with these results 
 is here:
 
 http://www.brianfrankcooper.net/pubs/scans.png
 
 Is this the expected performance? I'm using the OrderPreservingPartitioner 
 with InitialToken values that should evenly partition the data (and the 
 amount of data in /var/cassandra/data is about the same on all servers). I'm 
 using get_range_slice() from Java (code snippet below). 
 
 At the max throughput (230 ops/sec), when latency is over 1.2 sec, CPU usage 
 varies from ~5% to ~72% on different boxes. Disk busy varies from 60% to 90% 
 (and the machine with the busiest disk is not the one with highest CPU 
 usage.) Network utilization (eth0 %util both in and out) varies from 15%-40% 
 on different boxes. So clearly there is some imbalance (and the workload 
 itself is skewed via a Zipfian distribution) but I'm surprised that the 
 latencies are so high even in this case.
 
 Code snippet - fields is a SetString listing the columns I want; 
 recordcount is the number of records to return.
 
 SlicePredicate predicate;
 if (fields==null)
 {
   predicate = new SlicePredicate(null,new SliceRange(new byte[0], new 
 byte[0],false,100));
 }
 else
 {
   Vectorbyte[] fieldlist=new Vectorbyte[]();
   for (String s : fields)
   {
   fieldlist.add(s.getBytes(UTF-8));
   }
   predicate = new SlicePredicate(fieldlist,null);
 }
 ColumnParent parent = new ColumnParent(data, null);
   
 ListKeySlice results = 
 client.get_range_slice(table,parent,predicate,startkey,,recordcount,ConsistencyLevel.ONE);
   
 Thanks!
 
 Brian
 
 
 From: Brian Frank Cooper
 Sent: Saturday, January 30, 2010 7:56 AM
 To: cassandra-user@incubator.apache.org
 Subject: RE: Cassandra versus HBase performance study
 
 Good idea, we'll benchmark 0.5 next.
 
 brian
 
 -Original Message-
 From: Jonathan Ellis [mailto:jbel...@gmail.com]
 Sent: Friday, January 29, 2010 1:13 PM
 To: cassandra-user@incubator.apache.org
 Subject: Re: Cassandra versus HBase performance study
 
 Thanks for posting your results; it is an interesting read and we are
 pleased to beat HBase in most workloads. :)
 
 Since you originally benchmarked 0.4.2, you might be interested in the
 speed gains in 0.5.  A couple graphs here:
 http://spyced.blogspot.com/2010/01/cassandra-05.html
 
 0.6 (beta in a few weeks?) is looking even better. :)
 
 -Jonathan

--
Ian Holsman
i...@holsman.net





RE: Cassandra versus HBase performance study

2010-02-03 Thread Brian Frank Cooper
0.5 does seem to be significantly faster - the latency is better and it 
provides significantly more throughput. I'm updating my charts with new values 
now.

One thing that is puzzling is the scan performance. The scan experiment is to 
scan between 1-100 records on each request. My 6 node Cassandra cluster is only 
getting up to about 230 operations/sec, compared to 1400 ops/sec for other 
systems. The latency is quite a bit higher. A chart with these results is here:

http://www.brianfrankcooper.net/pubs/scans.png

Is this the expected performance? I'm using the OrderPreservingPartitioner with 
InitialToken values that should evenly partition the data (and the amount of 
data in /var/cassandra/data is about the same on all servers). I'm using 
get_range_slice() from Java (code snippet below). 

At the max throughput (230 ops/sec), when latency is over 1.2 sec, CPU usage 
varies from ~5% to ~72% on different boxes. Disk busy varies from 60% to 90% 
(and the machine with the busiest disk is not the one with highest CPU usage.) 
Network utilization (eth0 %util both in and out) varies from 15%-40% on 
different boxes. So clearly there is some imbalance (and the workload itself is 
skewed via a Zipfian distribution) but I'm surprised that the latencies are so 
high even in this case.

Code snippet - fields is a SetString listing the columns I want; recordcount 
is the number of records to return.

SlicePredicate predicate;
if (fields==null)
{
predicate = new SlicePredicate(null,new SliceRange(new byte[0], new 
byte[0],false,100));
}
else
{
Vectorbyte[] fieldlist=new Vectorbyte[]();
for (String s : fields)
{
fieldlist.add(s.getBytes(UTF-8));
}
predicate = new SlicePredicate(fieldlist,null);
}
ColumnParent parent = new ColumnParent(data, null);

ListKeySlice results = 
client.get_range_slice(table,parent,predicate,startkey,,recordcount,ConsistencyLevel.ONE);

Thanks!

Brian


From: Brian Frank Cooper
Sent: Saturday, January 30, 2010 7:56 AM
To: cassandra-user@incubator.apache.org
Subject: RE: Cassandra versus HBase performance study

Good idea, we'll benchmark 0.5 next.

brian

-Original Message-
From: Jonathan Ellis [mailto:jbel...@gmail.com]
Sent: Friday, January 29, 2010 1:13 PM
To: cassandra-user@incubator.apache.org
Subject: Re: Cassandra versus HBase performance study

Thanks for posting your results; it is an interesting read and we are
pleased to beat HBase in most workloads. :)

Since you originally benchmarked 0.4.2, you might be interested in the
speed gains in 0.5.  A couple graphs here:
http://spyced.blogspot.com/2010/01/cassandra-05.html

0.6 (beta in a few weeks?) is looking even better. :)

-Jonathan


Re: Cassandra versus HBase performance study

2010-02-03 Thread Jonathan Ellis
We have really obvious optimizations to make there that haven't been
done because the biggest contributors so far are using
RandomPartitioner...

Are you using get_key_range or get_range_slice for scanning?  The
former is even slower and deprecated.

With get_range_slice your comparator matters, BytesType is fastest.

-Jonathan

On Wed, Feb 3, 2010 at 6:45 PM, Brian Frank Cooper
coop...@yahoo-inc.com wrote:
 0.5 does seem to be significantly faster - the latency is better and it 
 provides significantly more throughput. I'm updating my charts with new 
 values now.

 One thing that is puzzling is the scan performance. The scan experiment is to 
 scan between 1-100 records on each request. My 6 node Cassandra cluster is 
 only getting up to about 230 operations/sec, compared to 1400 ops/sec for 
 other systems. The latency is quite a bit higher. A chart with these results 
 is here:

 http://www.brianfrankcooper.net/pubs/scans.png

 Is this the expected performance? I'm using the OrderPreservingPartitioner 
 with InitialToken values that should evenly partition the data (and the 
 amount of data in /var/cassandra/data is about the same on all servers). I'm 
 using get_range_slice() from Java (code snippet below).

 At the max throughput (230 ops/sec), when latency is over 1.2 sec, CPU usage 
 varies from ~5% to ~72% on different boxes. Disk busy varies from 60% to 90% 
 (and the machine with the busiest disk is not the one with highest CPU 
 usage.) Network utilization (eth0 %util both in and out) varies from 15%-40% 
 on different boxes. So clearly there is some imbalance (and the workload 
 itself is skewed via a Zipfian distribution) but I'm surprised that the 
 latencies are so high even in this case.

 Code snippet - fields is a SetString listing the columns I want; 
 recordcount is the number of records to return.

 SlicePredicate predicate;
 if (fields==null)
 {
        predicate = new SlicePredicate(null,new SliceRange(new byte[0], new 
 byte[0],false,100));
 }
 else
 {
        Vectorbyte[] fieldlist=new Vectorbyte[]();
        for (String s : fields)
        {
                fieldlist.add(s.getBytes(UTF-8));
        }
        predicate = new SlicePredicate(fieldlist,null);
 }
 ColumnParent parent = new ColumnParent(data, null);

 ListKeySlice results = 
 client.get_range_slice(table,parent,predicate,startkey,,recordcount,ConsistencyLevel.ONE);

 Thanks!

 Brian

 
 From: Brian Frank Cooper
 Sent: Saturday, January 30, 2010 7:56 AM
 To: cassandra-user@incubator.apache.org
 Subject: RE: Cassandra versus HBase performance study

 Good idea, we'll benchmark 0.5 next.

 brian

 -Original Message-
 From: Jonathan Ellis [mailto:jbel...@gmail.com]
 Sent: Friday, January 29, 2010 1:13 PM
 To: cassandra-user@incubator.apache.org
 Subject: Re: Cassandra versus HBase performance study

 Thanks for posting your results; it is an interesting read and we are
 pleased to beat HBase in most workloads. :)

 Since you originally benchmarked 0.4.2, you might be interested in the
 speed gains in 0.5.  A couple graphs here:
 http://spyced.blogspot.com/2010/01/cassandra-05.html

 0.6 (beta in a few weeks?) is looking even better. :)

 -Jonathan



RE: Cassandra versus HBase performance study

2010-02-03 Thread Brian Frank Cooper
We are using get_range_slice, and the AsciiType comparator. I'll try to run a 
test with BytesType; but how much difference do you expect?

I would be interested to know the types of optimizations you are planning. We 
are trying to understand how much of the performance results from fundamental 
design decisions versus how much results from the fact that the system is still 
under development.

Thanks!

Brian

-Original Message-
From: Jonathan Ellis [mailto:jbel...@gmail.com] 
Sent: Wednesday, February 03, 2010 5:05 PM
To: cassandra-user@incubator.apache.org
Subject: Re: Cassandra versus HBase performance study

We have really obvious optimizations to make there that haven't been
done because the biggest contributors so far are using
RandomPartitioner...

Are you using get_key_range or get_range_slice for scanning?  The
former is even slower and deprecated.

With get_range_slice your comparator matters, BytesType is fastest.

-Jonathan

On Wed, Feb 3, 2010 at 6:45 PM, Brian Frank Cooper
coop...@yahoo-inc.com wrote:
 0.5 does seem to be significantly faster - the latency is better and it 
 provides significantly more throughput. I'm updating my charts with new 
 values now.

 One thing that is puzzling is the scan performance. The scan experiment is to 
 scan between 1-100 records on each request. My 6 node Cassandra cluster is 
 only getting up to about 230 operations/sec, compared to 1400 ops/sec for 
 other systems. The latency is quite a bit higher. A chart with these results 
 is here:

 http://www.brianfrankcooper.net/pubs/scans.png

 Is this the expected performance? I'm using the OrderPreservingPartitioner 
 with InitialToken values that should evenly partition the data (and the 
 amount of data in /var/cassandra/data is about the same on all servers). I'm 
 using get_range_slice() from Java (code snippet below).

 At the max throughput (230 ops/sec), when latency is over 1.2 sec, CPU usage 
 varies from ~5% to ~72% on different boxes. Disk busy varies from 60% to 90% 
 (and the machine with the busiest disk is not the one with highest CPU 
 usage.) Network utilization (eth0 %util both in and out) varies from 15%-40% 
 on different boxes. So clearly there is some imbalance (and the workload 
 itself is skewed via a Zipfian distribution) but I'm surprised that the 
 latencies are so high even in this case.

 Code snippet - fields is a SetString listing the columns I want; 
 recordcount is the number of records to return.

 SlicePredicate predicate;
 if (fields==null)
 {
        predicate = new SlicePredicate(null,new SliceRange(new byte[0], new 
 byte[0],false,100));
 }
 else
 {
        Vectorbyte[] fieldlist=new Vectorbyte[]();
        for (String s : fields)
        {
                fieldlist.add(s.getBytes(UTF-8));
        }
        predicate = new SlicePredicate(fieldlist,null);
 }
 ColumnParent parent = new ColumnParent(data, null);

 ListKeySlice results = 
 client.get_range_slice(table,parent,predicate,startkey,,recordcount,ConsistencyLevel.ONE);

 Thanks!

 Brian

 
 From: Brian Frank Cooper
 Sent: Saturday, January 30, 2010 7:56 AM
 To: cassandra-user@incubator.apache.org
 Subject: RE: Cassandra versus HBase performance study

 Good idea, we'll benchmark 0.5 next.

 brian

 -Original Message-
 From: Jonathan Ellis [mailto:jbel...@gmail.com]
 Sent: Friday, January 29, 2010 1:13 PM
 To: cassandra-user@incubator.apache.org
 Subject: Re: Cassandra versus HBase performance study

 Thanks for posting your results; it is an interesting read and we are
 pleased to beat HBase in most workloads. :)

 Since you originally benchmarked 0.4.2, you might be interested in the
 speed gains in 0.5.  A couple graphs here:
 http://spyced.blogspot.com/2010/01/cassandra-05.html

 0.6 (beta in a few weeks?) is looking even better. :)

 -Jonathan



RE: Cassandra versus HBase performance study

2010-01-30 Thread Brian Frank Cooper
Good idea, we'll benchmark 0.5 next.

brian

-Original Message-
From: Jonathan Ellis [mailto:jbel...@gmail.com] 
Sent: Friday, January 29, 2010 1:13 PM
To: cassandra-user@incubator.apache.org
Subject: Re: Cassandra versus HBase performance study

Thanks for posting your results; it is an interesting read and we are
pleased to beat HBase in most workloads. :)

Since you originally benchmarked 0.4.2, you might be interested in the
speed gains in 0.5.  A couple graphs here:
http://spyced.blogspot.com/2010/01/cassandra-05.html

0.6 (beta in a few weeks?) is looking even better. :)

-Jonathan

On Fri, Jan 29, 2010 at 2:35 PM, Brian Frank Cooper
coop...@yahoo-inc.com wrote:
 Hi folks,



 We have been conducting a performance study comparing Cassandra and HBase
 (and Yahoo! PNUTS and MySQL) on identical hardware under identical
 workloads. Our focus has been on serving workloads (e.g. read and write
 individual records, rather than scan a whole table for MapReduce.) This is
 part of a larger effort to develop a benchmark for these kinds of systems
 (which we are calling YCSB, or the Yahoo Cloud Serving Benchmark.)



 I thought this list might be interested in the first set of results we have.
 We submitted a paper on these results, and the benchmark as a whole, and we
 are continuing to benchmark other scenarios and systems. But we have
 produced a snapshot of the results if you are interested:



 High level summary: http://www.brianfrankcooper.net/pubs/ycsb-v4.pdf

 Detailed paper: http://www.brianfrankcooper.net/pubs/ycsb.pdf



 In general, Cassandra performs quite well, with good throughput and latency
 compared to PNUTS (which we call Sherpa internally) and better throughput
 than HBase.



 I'd be happy to answer any questions about the results or discuss possible
 ways to tune Cassandra. We had already received extensive tuning help from
 this list last year (thanks!) but more suggestions are always helpful.



 The benchmark tool will be open sourced real soon now (we are just waiting
 for final approval from Yahoo legal) and our hope is that it is a useful
 tool for apples-to-apples comparison of different systems.



 Brian



 --

 Brian Cooper

 Principal Research Scientist

 Yahoo! Research