I just read the malstone report. They report times for a Java version that is many (5x) times slower than for a streaming implementation. That single fact indicates that the Java code is so appallingly bad that this is a very bad benchmark.
On Fri, Feb 18, 2011 at 2:27 PM, Jim Falgout <[email protected]>wrote: > We use MalStone and TeraSort. For Hive, you can use TPC-H, at least the > data and the queries, if not the query generator. There is a Jira issue in > Hive that discusses the TPC-H "benchmark" if you're interested. Sorry, I > don't remember the issue number offhand. > > -----Original Message----- > From: Shrinivas Joshi [mailto:[email protected]] > Sent: Friday, February 18, 2011 3:32 PM > To: [email protected] > Subject: benchmark choices > > Which workloads are used for serious benchmarking of Hadoop clusters? Do > you care about any of the following workloads : > TeraSort, GridMix v1, v2, or v3, MalStone, CloudBurst, MRBench, NNBench, > sample apps shipped with Hadoop distro like PiEstimator, dbcount etc. > > Thanks, > -Shrinivas > >
