Today I've also seen this benchmark in Chinese websites. SequoiaDB seems
come from a Chinese startup company, and in db-engines ranking
http://db-engines.com/en/ranking it's score is 0.00. So IMO I have to say
I think this benchmark is a soft sell. They compare three databases, two
written by c++ and one by java, and use a very tricky testcase to make
Cassandra can not hold all data in memtables. After all, java need more
memory than c++. For a on-disk database, generally data size of one node is
much larger than RAM, and it's performance of memory query is less
important than disk query.
So I think this benchmark have no value at all.
2014-12-19 14:47 GMT+08:00 Wilm Schumacher wilm.schumac...@gmail.com:
Hi,
I'm always interessted in such benchmark experiments, because the
databases evolve so fast, that the race is always open and there is a lot
motion in there.
And of course I askes myself the same question. And I think that this
publication is unreliable. For 4 reasons (from reading very fast, perhaps
there is more):
1.) It is unclear what this is all about. The title is NoSQL Performance
Testing. The subtitle is In-Memory Performance Comparison of SequoiaDB,
Cassandra, and MongoDB. However, in the introduction there is not one
word about in memory performance. The introduction could be a general
introduction for a general on-disk-nosql benchmark. So ... only the
subtitle (and a short sentence in the Result Summary) says what this is
actually about.
2.) There are very important databases missing. For in memory e.g.
redis. If e.g. redis is not a valid candidate in this race, why is this
so?MySQL is capable of in memory distributed databanking, too.
3.) The methodology is unclear. Perhaps I'm the only one, but what does
Run workload for 30 minutes (workload file workload[1-5]) mean for mixed
read/write ops? Why 30 min? Okay, I can image, that the authors estimated
the throughput, preset the number of 100 Mio rows and designed it to be
larger than the estimated throughput in x minutes. However, all this
information is missing. And why 45% and 22% of RAM? My first Idea would be
a VERY low ration, like 2% or so, and a VERY large ratio, like 80-90%. And
than everything in between. Is 22% or 45% somehow a magic number?
Furthermore in the Result summary there 1/2 and 1/4 of RAM are discussed.
Okay, 22% is near 1/4 ... but where does the difference origin from? And
btw. ... 22% of what? Stuff to insert? Stuff already insererted? It's all
deductable, but it's strange that the description is so sloppy.
4.) There is no repetion of the loads (as I understand). Its one run, one
result ... and it's done. I don't know a lot of cassandra in in-memory use.
But either the experiment should be repeated quite some runs OR it should
be explained why this is not neccessary.
Okay, perhaps 1 is a little picky, and 4 is a little fussy. But 3 is
strange and 2 stinks.
Well, just my first impression. And that's Cassandra is very fast ;).
Best regards
Wilm
Am 19.12.2014 um 06:41 schrieb diwayou:
i just have read this benchmark pdf, does anyone have some opinion
about this?
i think it's not fair about cassandra
url:
http://www.bankmark.de/wp-content/uploads/2014/12/bankmark-20141201-WP-NoSQLBenchmark.pdf
http://msrg.utoronto.ca/papers/NoSQLBenchmark