Well, that error indicates that it was unable to launch the daemon on witch3 for some reason. If you look at the error reported by bash, you will see that the ³orted² binary wasn¹t found!
Sounds like a path error you might check to see if witch3 has the binaries installed, and if they are where you told the system to look... Ralph On 6/30/08 5:21 AM, "Lenny Verkhovsky" <lenny.verkhov...@gmail.com> wrote: > I am not familiar with spawn test of IBM, but maybe this is right behavior, > if spawn test allocates 3 ranks on the node, and then allocates another 3 > then this test suppose to fail due to max_slots=4. > > But it fails with the fallowing hostfile as well BUT WITH A DIFFERENT ERROR. > > #cat hostfile2 > witch2 slots=4 max_slots=4 > witch3 slots=4 max_slots=4 > witch1:/home/BENCHMARKS/IBM # /home/USERS/lenny/OMPI_ORTE_18772/bin/mpirun -np > 3 -hostfile hostfile2 dynamic/spawn > bash: orted: command not found > [witch1:22789] > -------------------------------------------------------------------------- > A daemon (pid 22791) died unexpectedly with status 127 while attempting > to launch so we are aborting. > There may be more information reported by the environment (see above). > This may be because the daemon was unable to find all the needed shared > libraries on the remote node. You may set your LD_LIBRARY_PATH to have the > location of the shared libraries on the remote nodes and this will > automatically be forwarded to the remote nodes. > -------------------------------------------------------------------------- > [witch1:22789] > -------------------------------------------------------------------------- > mpirun was unable to cleanly terminate the daemons on the nodes shown > below. Additional manual cleanup may be required - please refer to > the "orte-clean" tool for assistance. > -------------------------------------------------------------------------- > witch3 - daemon did not report back when launched > > On Mon, Jun 30, 2008 at 9:38 AM, Lenny Verkhovsky <lenny.verkhov...@gmail.com> > wrote: >> Hi, >> trying to run mtt I failed to run IBM spawn test. It fails only when using >> hostfile, and not when using host list. >> ( OMPI from TRUNK ) >> >> This is working : >> #mpirun -np 3 -H witch2 dynamic/spawn >> >> This Fails: >> # cat hostfile >> witch2 slots=4 max_slots=4 >> #mpirun -np 3 -hostfile hostfile dynamic/spawn >> [witch1:12392] >> -------------------------------------------------------------------------- >> There are not enough slots available in the system to satisfy the 3 slots >> that were requested by the application: >> dynamic/spawn >> >> Either request fewer slots for your application, or make more slots available >> for use. >> -------------------------------------------------------------------------- >> [witch1:12392] >> -------------------------------------------------------------------------- >> A daemon (pid unknown) died unexpectedly on signal 1 while attempting to >> launch so we are aborting. >> >> There may be more information reported by the environment (see above). >> >> This may be because the daemon was unable to find all the needed shared >> libraries on the remote node. You may set your LD_LIBRARY_PATH to have the >> location of the shared libraries on the remote nodes and this will >> automatically be forwarded to the remote nodes. >> -------------------------------------------------------------------------- >> mpirun: clean termination accomplished >> >> >> Using hostfile1 also works >> #cat hostfile1 >> witch2 >> witch2 >> witch2 >> >> >> Best Regards >> Lenny. >> >