Adding Roman Shaposhnik to the list who's "tasked" with benchmarking @Cloudera
On Mon, Feb 21, 2011 at 12:39, Shrinivas Joshi <[email protected]> wrote: > I wonder what companies like Amazon, Cloudera, RackSpace, Facebook, Yahoo > etc. look at for the purpose of benchmarking. I guess GridMix v3 might be of > more interest to Yahoo. > > I would appreciate if someone can comment more on this. > > Thanks, > -Shrinivas > > On Fri, Feb 18, 2011 at 4:50 PM, Konstantin Boudnik <[email protected]> wrote: >> >> On Fri, Feb 18, 2011 at 14:35, Ted Dunning <[email protected]> wrote: >> > I just read the malstone report. They report times for a Java version >> > that >> > is many (5x) times slower than for a streaming implementation. That >> > single >> > fact indicates that the Java code is so appallingly bad that this is a >> > very >> > bad benchmark. >> >> Slow Java code? That's funny ;) Running with Hotspot on by any chance? >> >> > On Fri, Feb 18, 2011 at 2:27 PM, Jim Falgout >> > <[email protected]>wrote: >> > >> >> We use MalStone and TeraSort. For Hive, you can use TPC-H, at least the >> >> data and the queries, if not the query generator. There is a Jira issue >> >> in >> >> Hive that discusses the TPC-H "benchmark" if you're interested. Sorry, >> >> I >> >> don't remember the issue number offhand. >> >> >> >> -----Original Message----- >> >> From: Shrinivas Joshi [mailto:[email protected]] >> >> Sent: Friday, February 18, 2011 3:32 PM >> >> To: [email protected] >> >> Subject: benchmark choices >> >> >> >> Which workloads are used for serious benchmarking of Hadoop clusters? >> >> Do >> >> you care about any of the following workloads : >> >> TeraSort, GridMix v1, v2, or v3, MalStone, CloudBurst, MRBench, >> >> NNBench, >> >> sample apps shipped with Hadoop distro like PiEstimator, dbcount etc. >> >> >> >> Thanks, >> >> -Shrinivas >> >> >> >> >> > > >
