On May 16, 2011, at 11:03 AM, Evert Lammerts wrote:
> Hi all,
>
> What acceptance tests are people using when buying clusters for Hadoop? Any
> pointers to relevant methods?
We get some test nodes from various manufacturers. We do some raw IO
benchmarking vs. our other nodes. We add them to our various grids to see how
they perform real world, paying attn to avg task time turn around for certain
jobs. Since we know where our current machines are at, we can look at price
per perf improvements.
Other random things that I think are important:
a) Unless someone shares their entire *-site.xml data, most
published benchmarks on the net are mostly useless. Simple things like block
size have a big impact.
b) Test your actual workload. Synthetic benchmarks are just
that--synthetic. They may not reflect that particular nuances of your job.
c) Establish a baseline. If you have no hardware today, then at
least establish something on EC2 to compare.
d) Make sure you talk to multiple vendors.
e) Any advice anyone gives you on config is likely going to be
wrong.