Re: 2014 nosql benchmark

2014-12-19 Thread Philo Yang
Today I've also seen this benchmark in Chinese websites. SequoiaDB seems
come from a Chinese startup company, and in db-engines ranking
http://db-engines.com/en/ranking it's score is 0.00. So IMO I have to say
I think this benchmark is a soft sell. They compare three databases, two
written by c++ and one by java, and use a very tricky testcase to make
Cassandra can not hold all data in memtables.  After all, java need more
memory than c++. For a on-disk database, generally data size of one node is
much larger than RAM, and it's performance of memory query is less
important than disk query.

So I think this benchmark have no value at all.

2014-12-19 14:47 GMT+08:00 Wilm Schumacher wilm.schumac...@gmail.com:

  Hi,

 I'm always interessted in such benchmark experiments, because the
 databases evolve so fast, that the race is always open and there is a lot
 motion in there.

 And of course I askes myself the same question. And I think that this
 publication is unreliable. For 4 reasons (from reading very fast, perhaps
 there is more):

 1.) It is unclear what this is all about. The title is NoSQL Performance
 Testing. The subtitle is In-Memory Performance Comparison of SequoiaDB,
 Cassandra,  and MongoDB. However, in the introduction there is not one
 word about in memory performance. The introduction could be a general
 introduction for a general on-disk-nosql benchmark. So ... only the
 subtitle (and a short sentence in the Result Summary) says what this is
 actually about.

 2.) There are very important databases missing. For in memory e.g.
 redis. If e.g. redis is not a valid candidate in this race, why is this
 so?MySQL is capable of in memory distributed databanking, too.

 3.) The methodology is unclear. Perhaps I'm the only one, but what does
 Run workload for 30 minutes (workload file workload[1-5])  mean for mixed
 read/write ops? Why 30 min? Okay, I can image, that the authors estimated
 the throughput, preset the number of 100 Mio rows and designed it to be
 larger than the estimated throughput in x minutes. However, all this
 information is missing. And why 45% and 22% of RAM? My first Idea would be
 a VERY low ration, like 2% or so, and a VERY large ratio, like 80-90%. And
 than everything in between. Is 22% or 45% somehow a magic number?
 Furthermore in the Result summary there 1/2 and 1/4 of RAM are discussed.
 Okay, 22% is near 1/4 ... but where does the difference origin from? And
 btw. ... 22% of what? Stuff to insert? Stuff already insererted? It's all
 deductable, but it's strange that the description is so sloppy.

 4.) There is no repetion of the loads (as I understand). Its one run, one
 result ... and it's done. I don't know a lot of cassandra in in-memory use.
 But either the experiment should be repeated quite some runs OR it should
 be explained why this is not neccessary.

 Okay, perhaps 1 is a little picky, and 4 is a little fussy. But 3 is
 strange and 2 stinks.

 Well, just my first impression. And that's Cassandra is very fast ;).

 Best regards

 Wilm


 Am 19.12.2014 um 06:41 schrieb diwayou:

   i just have read this benchmark pdf, does anyone have some opinion
 about this?
 i think it's not fair about cassandra
 url:
 http://www.bankmark.de/wp-content/uploads/2014/12/bankmark-20141201-WP-NoSQLBenchmark.pdf
 ‍
 http://msrg.utoronto.ca/papers/NoSQLBenchmark‍





2014 nosql benchmark

2014-12-18 Thread diwayou
i just have read this benchmark pdf, does anyone have some opinion about this?
i think it's not fair about cassandra
url:http://www.bankmark.de/wp-content/uploads/2014/12/bankmark-20141201-WP-NoSQLBenchmark.pdf‍
http://msrg.utoronto.ca/papers/NoSQLBenchmark‍

Re: 2014 nosql benchmark

2014-12-18 Thread Wilm Schumacher
Hi,

I'm always interessted in such benchmark experiments, because the
databases evolve so fast, that the race is always open and there is a
lot motion in there.

And of course I askes myself the same question. And I think that this
publication is unreliable. For 4 reasons (from reading very fast,
perhaps there is more):

1.) It is unclear what this is all about. The title is NoSQL
Performance Testing. The subtitle is In-Memory Performance Comparison
of SequoiaDB, Cassandra,  and MongoDB. However, in the introduction
there is not one word about in memory performance. The introduction
could be a general introduction for a general on-disk-nosql benchmark.
So ... only the subtitle (and a short sentence in the Result Summary)
says what this is actually about.

2.) There are very important databases missing. For in memory e.g.
redis. If e.g. redis is not a valid candidate in this race, why is this
so?MySQL is capable of in memory distributed databanking, too.

3.) The methodology is unclear. Perhaps I'm the only one, but what does
Run workload for 30 minutes (workload file workload[1-5])  mean for
mixed read/write ops? Why 30 min? Okay, I can image, that the authors
estimated the throughput, preset the number of 100 Mio rows and designed
it to be larger than the estimated throughput in x minutes. However, all
this information is missing. And why 45% and 22% of RAM? My first Idea
would be a VERY low ration, like 2% or so, and a VERY large ratio, like
80-90%. And than everything in between. Is 22% or 45% somehow a magic
number? Furthermore in the Result summary there 1/2 and 1/4 of RAM are
discussed. Okay, 22% is near 1/4 ... but where does the difference
origin from? And btw. ... 22% of what? Stuff to insert? Stuff already
insererted? It's all deductable, but it's strange that the description
is so sloppy.

4.) There is no repetion of the loads (as I understand). Its one run,
one result ... and it's done. I don't know a lot of cassandra in
in-memory use. But either the experiment should be repeated quite some
runs OR it should be explained why this is not neccessary.

Okay, perhaps 1 is a little picky, and 4 is a little fussy. But 3 is
strange and 2 stinks.

Well, just my first impression. And that's Cassandra is very fast ;).

Best regards

Wilm


Am 19.12.2014 um 06:41 schrieb diwayou:
 i just have read this benchmark pdf, does anyone have some opinion
 about this?
 i think it's not fair about cassandra
 url:http://www.bankmark.de/wp-content/uploads/2014/12/bankmark-20141201-WP-NoSQLBenchmark.pdf‍
 http://msrg.utoronto.ca/papers/NoSQLBenchmark‍