Yes. If you look at the README, gridmix-env, and the generateData script, you should be able to alter the job mix to match your requirements. In particular, you probably want to look closely at the number of small, medium, and large jobs for each run. For a three node cluster, you might want to try running only the small jobs (possibly the medium jobs). Note that you don't have to generate the entropy dataset if you don't plan on running any large jobs (what it tests is not interesting on three nodes anyway). Note that the "real" dataset is 1000 times larger than what generateData does by default; a smaller dataset may let you keep the total number of jobs up, though you should also be wary of the load on the submitting node (see submissionScripts/sleep_if_too_busy). Keep in mind that each node may also store (possibly uncompressed) copies of the datasets as intermedate map outputs, so budgeting for local disk space will also be important while gridmix runs, particularly for "medium" jobs. Good luck. -C

On Sep 17, 2008, at 3:27 PM, Joel Welling wrote:

Hi folks;
 I'd like to try the gridmix benchmark on my small cluster (3 nodes at
8 cores each, Lustre with IB interconnect).  The documentation for
gridmix suggests that it will take 4 hours on a 500 node cluster, which suggests it would take me something like a week to run. Is there a way
to scale the problem size back?  I don't mind the file size too much,
but the running time would be excessive if things scale linearly with
the number of nodes.

Thanks,
-Joel


Reply via email to