Thanks for reporting this. I just committed code to the rsh pls to
specifically check $bindir if the orted is not found in your path (on
the local node). If orted is still not found, it'll now issue a
friendly error message:
[7:58] vogon:~/mpi % mpirun -np 1 hello
------------------------------------------------------------------------
--
The rsh PLS component was not able to find the executable "orted" in
your PATH or in the directory where Open MPI was initially installed,
and therefore cannot continue.
For reference, your current PATH is:
/opt/torque/bin:/u/jsquyres/bogus/bin:/home/jsquyres/local/bin:/u/
jsquyres/local/bin:/l/osl/Software/i686-pc-linux-gnu/bin:/usr/local/
gnu/bin:/usr/local/bin:/usr/local/bin:/opt/bin:/usr/i686-pc-linux-gnu/
gcc-bin/3.3.5-20050130:/opt/intel_cc_80/bin:/opt/intel_idb_80/bin:/opt/
intel_fc_80/bin:/opt/ICAClient:/opt/stuffit/bin:/opt/sun-jdk-1.4.2.08/
bin:/opt/sun-jdk-1.4.2.08/jre/bin:/opt/sun-jdk-1.4.2.08/jre/javaws:/
usr/qt/3/bin:/usr/kde/3.3/bin:/usr/qt/2/bin:/opt/vmware/bin:/opt/aim:/
bin:/usr/bin:/opt/absoft/bin
We also looked for orte in the following directory:
/u/jsquyres/bogus/bin
------------------------------------------------------------------------
--
[0,0,0] ORTE_ERROR_LOG: ORTE_ERR_NOT_FOUND in file rmgr_urm.c at line
320
mpirun: spawn failed with errno=-16
ERROR: A daemon on node vogon failed to start as expected.
ERROR: There may be more information available from
ERROR: the remote shell (see above).
ERROR: The daemon exited unexpectedly with status 240.
[7:59] vogon:~/mpi %
I also included in there an output of your current $PATH, so that
problems like you ran into are more obvious (some other agent changing
your PATH to something that you didn't expect).
On Jul 27, 2005, at 12:50 PM, Greg Watson wrote:
Hi all,
To recap: the problem was that if orted was launched from Eclipse (on
OS X) then subsequent attempts to run a program (using mpirun or
whatever) returned immediately. If orted was launched from anywhere
else (java, command line, etc.) it worked fine.
Turning on daemon logging showed that the reason that the program was
aborting immediately was that the execv() of the ssh command to the
remote machine was exiting with errno=14 (EFAULT). Clearly there was
some environment difference, and after much checking it became
apparent that the difference was that the Eclipse-launched orted did
not have $(OMPI_INSTALL) in it's path. The orte_pls_rsh_launch()
function checks if you're launching onto the local or a remote
machine. For local machines (as it was in this case), it calls
opal_path_findv() to find the local path of orted. Unfortunately
because $(OMPI_INSTALL) is not included in the local path, this fails
by returning NULL. The NULL is then passed to the first argument of
execv() which returns EFAULT.
The problem is easily reproducible by taking $(OMPI_INSTALL) out of
your path, running $(OMPI_INSTALL)/orted, then trying to run
something with mpirun.
Why did it work from the command line? On OS X, the shell gets the
PATH set in ~/.bash_profile, etc., (which in this case contained
OMPI_INSTALL) but applications launched from window system get their
path from the loginwindow app, which looks in ~/.MacOSX/
environment.plist for environment variables (which didn't contain
OMPI_INSTALL). I suspect, but haven't tried, launching Eclipse from
the command line would have worked.
I'm not sure why the logic is there to look up the path again for
local launches, since it should be the same as the path in the
component. It should certainly check for a NULL return though.
Greg
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
--
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/