I wonder what companies like Amazon, Cloudera, RackSpace, Facebook, Yahoo etc. look at for the purpose of benchmarking. I guess GridMix v3 might be of more interest to Yahoo.
I would appreciate if someone can comment more on this. Thanks, -Shrinivas On Fri, Feb 18, 2011 at 4:50 PM, Konstantin Boudnik <[email protected]> wrote: > On Fri, Feb 18, 2011 at 14:35, Ted Dunning <[email protected]> wrote: > > I just read the malstone report. They report times for a Java version > that > > is many (5x) times slower than for a streaming implementation. That > single > > fact indicates that the Java code is so appallingly bad that this is a > very > > bad benchmark. > > Slow Java code? That's funny ;) Running with Hotspot on by any chance? > > > On Fri, Feb 18, 2011 at 2:27 PM, Jim Falgout <[email protected] > >wrote: > > > >> We use MalStone and TeraSort. For Hive, you can use TPC-H, at least the > >> data and the queries, if not the query generator. There is a Jira issue > in > >> Hive that discusses the TPC-H "benchmark" if you're interested. Sorry, I > >> don't remember the issue number offhand. > >> > >> -----Original Message----- > >> From: Shrinivas Joshi [mailto:[email protected]] > >> Sent: Friday, February 18, 2011 3:32 PM > >> To: [email protected] > >> Subject: benchmark choices > >> > >> Which workloads are used for serious benchmarking of Hadoop clusters? Do > >> you care about any of the following workloads : > >> TeraSort, GridMix v1, v2, or v3, MalStone, CloudBurst, MRBench, NNBench, > >> sample apps shipped with Hadoop distro like PiEstimator, dbcount etc. > >> > >> Thanks, > >> -Shrinivas > >> > >> > > >
