Re: Cassandra versus HBase performance study
On Wed, Feb 3, 2010 at 7:45 PM, Brian Frank Cooper coop...@yahoo-inc.com wrote: One thing that is puzzling is the scan performance. The scan experiment is to scan between 1-100 records on each request. My 6 node Cassandra cluster is only getting up to about 230 operations/sec, compared to 1400 ops/sec for other systems. The latency is quite a bit higher. A chart with these results is here: http://www.brianfrankcooper.net/pubs/scans.png Is this the expected performance? I'm using the OrderPreservingPartitioner with InitialToken values that should evenly partition the data (and the amount of data in /var/cassandra/data is about the same on all servers). I'm using get_range_slice() from Java (code snippet below). This got some attention for 0.6, since we have added Hadoop support in that release. (0.6 is branched now, Beta / RC coming soon.) Turns out the (or more likely:: a :) main bottleneck was, our memtables were not kept ordered by key, so it had to sort them for each range query. Switching from NonBlockingHashMap to ConcurrentSkiplistMap made things much faster. (CASSANDRA-799) We're planning on optimizing this more for 0.7, and we've added range queries to our stress test tool (CASSANDRA-765) for that. -Jonathan
Re: Cassandra versus HBase performance study
On Fri, Feb 5, 2010 at 4:51 PM, Brian Frank Cooper coop...@yahoo-inc.com wrote: Yes, I had used the default 0.1. These boxes have 8 GB of RAM and I was giving 6 GB to the JVM (-Xmx). Does Cassandra do a read caching of data? It seems from the text in storage.conf that keys cache fraction refers only to indexing the keys, not caching the content. So I would imagine increasing the keys cached fraction would decrease the memory used for data caching 0.5 doesn't do data caching (except what you get for free from the OS). 0.6 will change this, but for 0.5 things are nice and simple. :) So a large heap can actually make things worse if it makes the GC lazy and the OS can't use that to cache data. If your data set is larger than 5GB on disk I would say give it KeysCachedFraction of 0.2, 3GB JVM heap. Otherwise 0.4 and 5GB. -Jonathan
Re: Cassandra versus HBase performance study
Hi Brian. was there any performance changes on the other tests with v0.5 ? the graphs on the other pages looks remarkably identical. On Feb 4, 2010, at 11:45 AM, Brian Frank Cooper wrote: 0.5 does seem to be significantly faster - the latency is better and it provides significantly more throughput. I'm updating my charts with new values now. One thing that is puzzling is the scan performance. The scan experiment is to scan between 1-100 records on each request. My 6 node Cassandra cluster is only getting up to about 230 operations/sec, compared to 1400 ops/sec for other systems. The latency is quite a bit higher. A chart with these results is here: http://www.brianfrankcooper.net/pubs/scans.png Is this the expected performance? I'm using the OrderPreservingPartitioner with InitialToken values that should evenly partition the data (and the amount of data in /var/cassandra/data is about the same on all servers). I'm using get_range_slice() from Java (code snippet below). At the max throughput (230 ops/sec), when latency is over 1.2 sec, CPU usage varies from ~5% to ~72% on different boxes. Disk busy varies from 60% to 90% (and the machine with the busiest disk is not the one with highest CPU usage.) Network utilization (eth0 %util both in and out) varies from 15%-40% on different boxes. So clearly there is some imbalance (and the workload itself is skewed via a Zipfian distribution) but I'm surprised that the latencies are so high even in this case. Code snippet - fields is a SetString listing the columns I want; recordcount is the number of records to return. SlicePredicate predicate; if (fields==null) { predicate = new SlicePredicate(null,new SliceRange(new byte[0], new byte[0],false,100)); } else { Vectorbyte[] fieldlist=new Vectorbyte[](); for (String s : fields) { fieldlist.add(s.getBytes(UTF-8)); } predicate = new SlicePredicate(fieldlist,null); } ColumnParent parent = new ColumnParent(data, null); ListKeySlice results = client.get_range_slice(table,parent,predicate,startkey,,recordcount,ConsistencyLevel.ONE); Thanks! Brian From: Brian Frank Cooper Sent: Saturday, January 30, 2010 7:56 AM To: cassandra-user@incubator.apache.org Subject: RE: Cassandra versus HBase performance study Good idea, we'll benchmark 0.5 next. brian -Original Message- From: Jonathan Ellis [mailto:jbel...@gmail.com] Sent: Friday, January 29, 2010 1:13 PM To: cassandra-user@incubator.apache.org Subject: Re: Cassandra versus HBase performance study Thanks for posting your results; it is an interesting read and we are pleased to beat HBase in most workloads. :) Since you originally benchmarked 0.4.2, you might be interested in the speed gains in 0.5. A couple graphs here: http://spyced.blogspot.com/2010/01/cassandra-05.html 0.6 (beta in a few weeks?) is looking even better. :) -Jonathan -- Ian Holsman i...@holsman.net
RE: Cassandra versus HBase performance study
0.5 does seem to be significantly faster - the latency is better and it provides significantly more throughput. I'm updating my charts with new values now. One thing that is puzzling is the scan performance. The scan experiment is to scan between 1-100 records on each request. My 6 node Cassandra cluster is only getting up to about 230 operations/sec, compared to 1400 ops/sec for other systems. The latency is quite a bit higher. A chart with these results is here: http://www.brianfrankcooper.net/pubs/scans.png Is this the expected performance? I'm using the OrderPreservingPartitioner with InitialToken values that should evenly partition the data (and the amount of data in /var/cassandra/data is about the same on all servers). I'm using get_range_slice() from Java (code snippet below). At the max throughput (230 ops/sec), when latency is over 1.2 sec, CPU usage varies from ~5% to ~72% on different boxes. Disk busy varies from 60% to 90% (and the machine with the busiest disk is not the one with highest CPU usage.) Network utilization (eth0 %util both in and out) varies from 15%-40% on different boxes. So clearly there is some imbalance (and the workload itself is skewed via a Zipfian distribution) but I'm surprised that the latencies are so high even in this case. Code snippet - fields is a SetString listing the columns I want; recordcount is the number of records to return. SlicePredicate predicate; if (fields==null) { predicate = new SlicePredicate(null,new SliceRange(new byte[0], new byte[0],false,100)); } else { Vectorbyte[] fieldlist=new Vectorbyte[](); for (String s : fields) { fieldlist.add(s.getBytes(UTF-8)); } predicate = new SlicePredicate(fieldlist,null); } ColumnParent parent = new ColumnParent(data, null); ListKeySlice results = client.get_range_slice(table,parent,predicate,startkey,,recordcount,ConsistencyLevel.ONE); Thanks! Brian From: Brian Frank Cooper Sent: Saturday, January 30, 2010 7:56 AM To: cassandra-user@incubator.apache.org Subject: RE: Cassandra versus HBase performance study Good idea, we'll benchmark 0.5 next. brian -Original Message- From: Jonathan Ellis [mailto:jbel...@gmail.com] Sent: Friday, January 29, 2010 1:13 PM To: cassandra-user@incubator.apache.org Subject: Re: Cassandra versus HBase performance study Thanks for posting your results; it is an interesting read and we are pleased to beat HBase in most workloads. :) Since you originally benchmarked 0.4.2, you might be interested in the speed gains in 0.5. A couple graphs here: http://spyced.blogspot.com/2010/01/cassandra-05.html 0.6 (beta in a few weeks?) is looking even better. :) -Jonathan
Re: Cassandra versus HBase performance study
We have really obvious optimizations to make there that haven't been done because the biggest contributors so far are using RandomPartitioner... Are you using get_key_range or get_range_slice for scanning? The former is even slower and deprecated. With get_range_slice your comparator matters, BytesType is fastest. -Jonathan On Wed, Feb 3, 2010 at 6:45 PM, Brian Frank Cooper coop...@yahoo-inc.com wrote: 0.5 does seem to be significantly faster - the latency is better and it provides significantly more throughput. I'm updating my charts with new values now. One thing that is puzzling is the scan performance. The scan experiment is to scan between 1-100 records on each request. My 6 node Cassandra cluster is only getting up to about 230 operations/sec, compared to 1400 ops/sec for other systems. The latency is quite a bit higher. A chart with these results is here: http://www.brianfrankcooper.net/pubs/scans.png Is this the expected performance? I'm using the OrderPreservingPartitioner with InitialToken values that should evenly partition the data (and the amount of data in /var/cassandra/data is about the same on all servers). I'm using get_range_slice() from Java (code snippet below). At the max throughput (230 ops/sec), when latency is over 1.2 sec, CPU usage varies from ~5% to ~72% on different boxes. Disk busy varies from 60% to 90% (and the machine with the busiest disk is not the one with highest CPU usage.) Network utilization (eth0 %util both in and out) varies from 15%-40% on different boxes. So clearly there is some imbalance (and the workload itself is skewed via a Zipfian distribution) but I'm surprised that the latencies are so high even in this case. Code snippet - fields is a SetString listing the columns I want; recordcount is the number of records to return. SlicePredicate predicate; if (fields==null) { predicate = new SlicePredicate(null,new SliceRange(new byte[0], new byte[0],false,100)); } else { Vectorbyte[] fieldlist=new Vectorbyte[](); for (String s : fields) { fieldlist.add(s.getBytes(UTF-8)); } predicate = new SlicePredicate(fieldlist,null); } ColumnParent parent = new ColumnParent(data, null); ListKeySlice results = client.get_range_slice(table,parent,predicate,startkey,,recordcount,ConsistencyLevel.ONE); Thanks! Brian From: Brian Frank Cooper Sent: Saturday, January 30, 2010 7:56 AM To: cassandra-user@incubator.apache.org Subject: RE: Cassandra versus HBase performance study Good idea, we'll benchmark 0.5 next. brian -Original Message- From: Jonathan Ellis [mailto:jbel...@gmail.com] Sent: Friday, January 29, 2010 1:13 PM To: cassandra-user@incubator.apache.org Subject: Re: Cassandra versus HBase performance study Thanks for posting your results; it is an interesting read and we are pleased to beat HBase in most workloads. :) Since you originally benchmarked 0.4.2, you might be interested in the speed gains in 0.5. A couple graphs here: http://spyced.blogspot.com/2010/01/cassandra-05.html 0.6 (beta in a few weeks?) is looking even better. :) -Jonathan
RE: Cassandra versus HBase performance study
We are using get_range_slice, and the AsciiType comparator. I'll try to run a test with BytesType; but how much difference do you expect? I would be interested to know the types of optimizations you are planning. We are trying to understand how much of the performance results from fundamental design decisions versus how much results from the fact that the system is still under development. Thanks! Brian -Original Message- From: Jonathan Ellis [mailto:jbel...@gmail.com] Sent: Wednesday, February 03, 2010 5:05 PM To: cassandra-user@incubator.apache.org Subject: Re: Cassandra versus HBase performance study We have really obvious optimizations to make there that haven't been done because the biggest contributors so far are using RandomPartitioner... Are you using get_key_range or get_range_slice for scanning? The former is even slower and deprecated. With get_range_slice your comparator matters, BytesType is fastest. -Jonathan On Wed, Feb 3, 2010 at 6:45 PM, Brian Frank Cooper coop...@yahoo-inc.com wrote: 0.5 does seem to be significantly faster - the latency is better and it provides significantly more throughput. I'm updating my charts with new values now. One thing that is puzzling is the scan performance. The scan experiment is to scan between 1-100 records on each request. My 6 node Cassandra cluster is only getting up to about 230 operations/sec, compared to 1400 ops/sec for other systems. The latency is quite a bit higher. A chart with these results is here: http://www.brianfrankcooper.net/pubs/scans.png Is this the expected performance? I'm using the OrderPreservingPartitioner with InitialToken values that should evenly partition the data (and the amount of data in /var/cassandra/data is about the same on all servers). I'm using get_range_slice() from Java (code snippet below). At the max throughput (230 ops/sec), when latency is over 1.2 sec, CPU usage varies from ~5% to ~72% on different boxes. Disk busy varies from 60% to 90% (and the machine with the busiest disk is not the one with highest CPU usage.) Network utilization (eth0 %util both in and out) varies from 15%-40% on different boxes. So clearly there is some imbalance (and the workload itself is skewed via a Zipfian distribution) but I'm surprised that the latencies are so high even in this case. Code snippet - fields is a SetString listing the columns I want; recordcount is the number of records to return. SlicePredicate predicate; if (fields==null) { predicate = new SlicePredicate(null,new SliceRange(new byte[0], new byte[0],false,100)); } else { Vectorbyte[] fieldlist=new Vectorbyte[](); for (String s : fields) { fieldlist.add(s.getBytes(UTF-8)); } predicate = new SlicePredicate(fieldlist,null); } ColumnParent parent = new ColumnParent(data, null); ListKeySlice results = client.get_range_slice(table,parent,predicate,startkey,,recordcount,ConsistencyLevel.ONE); Thanks! Brian From: Brian Frank Cooper Sent: Saturday, January 30, 2010 7:56 AM To: cassandra-user@incubator.apache.org Subject: RE: Cassandra versus HBase performance study Good idea, we'll benchmark 0.5 next. brian -Original Message- From: Jonathan Ellis [mailto:jbel...@gmail.com] Sent: Friday, January 29, 2010 1:13 PM To: cassandra-user@incubator.apache.org Subject: Re: Cassandra versus HBase performance study Thanks for posting your results; it is an interesting read and we are pleased to beat HBase in most workloads. :) Since you originally benchmarked 0.4.2, you might be interested in the speed gains in 0.5. A couple graphs here: http://spyced.blogspot.com/2010/01/cassandra-05.html 0.6 (beta in a few weeks?) is looking even better. :) -Jonathan
RE: Cassandra versus HBase performance study
Good idea, we'll benchmark 0.5 next. brian -Original Message- From: Jonathan Ellis [mailto:jbel...@gmail.com] Sent: Friday, January 29, 2010 1:13 PM To: cassandra-user@incubator.apache.org Subject: Re: Cassandra versus HBase performance study Thanks for posting your results; it is an interesting read and we are pleased to beat HBase in most workloads. :) Since you originally benchmarked 0.4.2, you might be interested in the speed gains in 0.5. A couple graphs here: http://spyced.blogspot.com/2010/01/cassandra-05.html 0.6 (beta in a few weeks?) is looking even better. :) -Jonathan On Fri, Jan 29, 2010 at 2:35 PM, Brian Frank Cooper coop...@yahoo-inc.com wrote: Hi folks, We have been conducting a performance study comparing Cassandra and HBase (and Yahoo! PNUTS and MySQL) on identical hardware under identical workloads. Our focus has been on serving workloads (e.g. read and write individual records, rather than scan a whole table for MapReduce.) This is part of a larger effort to develop a benchmark for these kinds of systems (which we are calling YCSB, or the Yahoo Cloud Serving Benchmark.) I thought this list might be interested in the first set of results we have. We submitted a paper on these results, and the benchmark as a whole, and we are continuing to benchmark other scenarios and systems. But we have produced a snapshot of the results if you are interested: High level summary: http://www.brianfrankcooper.net/pubs/ycsb-v4.pdf Detailed paper: http://www.brianfrankcooper.net/pubs/ycsb.pdf In general, Cassandra performs quite well, with good throughput and latency compared to PNUTS (which we call Sherpa internally) and better throughput than HBase. I'd be happy to answer any questions about the results or discuss possible ways to tune Cassandra. We had already received extensive tuning help from this list last year (thanks!) but more suggestions are always helpful. The benchmark tool will be open sourced real soon now (we are just waiting for final approval from Yahoo legal) and our hope is that it is a useful tool for apples-to-apples comparison of different systems. Brian -- Brian Cooper Principal Research Scientist Yahoo! Research