Erik, Very useful info. As you may know (saw the reference to BDBC in your page), we are organizing fourth workshop on Big Data Benchmarking on Oct 9-10 in San Jose (http://clds.ucsd.edu/bdbc/workshops/fourth_wbdb). In this workshop, we hope to get closer to defining a definitive Big Data benchmark. There have been several efforts underway, and we hope to bring them together under the TPC umbrella. In particular, the most promising candidate currently is BigBench (http://dl.acm.org/citation.cfm?id=2463712).
It would be great if you could attend this workshop and present your views. Thanks, - milind --- Milind Bhandarkar Chief Scientist Pivotal +1-650-523-3858 (W) +1-408-666-8483 (C) On Wed, Sep 4, 2013 at 2:27 PM, Erik Paulson <epaul...@unit1127.com> wrote: > Hello all - > > As part of a side project, I've been interested in HDFS benchmarking, > particularly of the Namenode. To get started, I tried to track down a > number of different benchmarks and collect a few observations about each. > I've put together a list here: > > http://epaulson.github.io/HadoopInternals/benchmarks.html > > The benchmarks I included were: > DFSIO > DFSIO-e > NNBench and NNBenchWithoutMR > S-Live > LoadGenerator > NNThroughputBenchmark > TestEditLog > MStress, from Quantcast > Ohio State Microbenchmarks > SWIM > > (I also wrote a bit about what else I'd like to see in a NN benchmark) > > I'd appreciate any corrections, feedback, and pointers to code that I > missed! > > Thanks! > > -Erik >