This is a question about ompi-tests/ibm/dynamic. Some of these tests
(spawn, spawn_multiple, loop_spawn/child, and no-disconnect) exercise
MPI_Comm_spawn* functionality. Specifically, they spawn additional
processes (beyond the initial mpirun launch) and therefore exert a
different load on a test system than one might naively expect from the
"mpirun -np <np>" command line.
One approach to testing is to have the test harness know characteristics
about individual tests like this. E.g., if I have only 8 processors and
I don't want to oversubscribe, have the test harness know that
particular tests should be launched with fewer processes. On the other
hand, building such generality into a test harness when changes would
have to be so pervasive (subjective assessment) and so few tests require
it may not make that much sense.
Another approach would be to manage oversubscription in the tests
themselves. E.g., for spawn.c, instead of spawning np new processes, do
the following:
- idle np/2 of the processes
- have the remaining np/2 processes spawn np/2 new ones
(Okay, so that leaves open the possibility that the newly spawned
processes might not appear on the same nodes where idled processes have
"made room" for them. Each solution seems loaded with shortcomings.)
Anyhow, I was interested in some feedback on this topic. A very small
number (1-4) of spawning tests are causing us lots of problems (undue
complexity in the test harness as well as a bunch of our time for
reasons I find difficult to explain succinctly). We're inclined to
modify the tests so that they're a little more social. E.g., make
decisions about how many of the launched processes should "really" be
used, idling some fraction of the processes, and continuing the test
only with the remaining fraction.
Comments?