Owen O'Malley wrote:
On Jul 21, 2009, at 8:28 AM, Ted Dunning wrote:
There are already several such efforts.
Pig has PigMix
Hadoop has terasort and likely some others as well.
Hadoop has the terasort, and grid mix. There is even a new version of
the grid mix coming out. Look at:
https://issues.apache.org/jira/browse/MAPREDUCE-776
I've been using Paolo's PageRank implementation over the Citeseer DB as
a low-startup-cost, CPU-intensive test. This lets me compare different
individual machines (desktop, laptop, VMs) and, for the physical ones,
measure power consumption too. A single machine test is nice for
measuring how different disk, CPU options work, while ignoring things
like LAN setup