We use MalStone and TeraSort. For Hive, you can use TPC-H, at least the data and the queries, if not the query generator. There is a Jira issue in Hive that discusses the TPC-H "benchmark" if you're interested. Sorry, I don't remember the issue number offhand.
-----Original Message----- From: Shrinivas Joshi [mailto:[email protected]] Sent: Friday, February 18, 2011 3:32 PM To: [email protected] Subject: benchmark choices Which workloads are used for serious benchmarking of Hadoop clusters? Do you care about any of the following workloads : TeraSort, GridMix v1, v2, or v3, MalStone, CloudBurst, MRBench, NNBench, sample apps shipped with Hadoop distro like PiEstimator, dbcount etc. Thanks, -Shrinivas
