On Thu, Jul 19, 2007 at 08:07:51AM -0600, Ralph H Castain wrote: > Interesting. Apparently, it is getting a NULL back when it tries to access > the LD_LIBRARY_PATH in your environment. Here is the code involved: > > newenv = opal_os_path( false, prefix_dir, lib_base, NULL ); > oldenv = getenv("LD_LIBRARY_PATH"); > if (NULL != oldenv) { > char* temp; > asprintf(&temp, "%s:%s", newenv, oldenv); > free(newenv); > newenv = temp; > } > opal_setenv("LD_LIBRARY_PATH", newenv, true, &env); > if (mca_pls_rsh_component.debug) { > opal_output(0, "pls:rsh: reset LD_LIBRARY_PATH: %s", newenv); > } > free(newenv); > > So you can see that the only way we can get your debugging output is for the > LD_LIBRARY_PATH in your starting environment to be NULL. Note that this > comes after we fork, so we are talking about the child process - not sure > that matters, but may as well point it out. > > So the question is: why do you not have LD_LIBRARY_PATH set in your > environment when you provide a different hostname? Right I don't have LD_LIBRARY_PATH set in my environment, but I expect that mpirun will provide working environment for all ranks not just remote ones. This is how it worked before. Perhaps that was a bug, but this was useful bug :)
> > > On 7/19/07 7:45 AM, "Gleb Natapov" <gl...@voltaire.com> wrote: > > > On Wed, Jul 18, 2007 at 09:08:38PM +0300, Gleb Natapov wrote: > >> On Wed, Jul 18, 2007 at 09:08:47AM -0600, Ralph H Castain wrote: > >>> But this will lockup: > >>> > >>> pn1180961:~/openmpi/trunk rhc$ mpirun -n 1 -host pn1180961 printenv | grep > >>> LD > >>> > >>> The reason is that the hostname in this last command doesn't match the > >>> hostname I get when I query my interfaces, so mpirun thinks it must be a > >>> remote host - and so we stick in ssh until that times out. Which could be > >>> quick on your machine, but takes awhile for me. > >>> > >> This is not my case. mpirun resolves hostname and runs env but > >> LD_LIBRARY_PATH is not there. If I use full name like this > >> # /home/glebn/openmpi/bin/mpirun -np 1 -H elfit1.voltaire.com env | grep > >> LD_LIBRARY_PATH > >> LD_LIBRARY_PATH=/home/glebn/openmpi/lib > >> > >> everything is OK. > >> > > More info. If I provide hostname to mpirun as returned by command > > "hostname" the LD_LIBRARY_PATH is not set: > > # /home/glebn/openmpi/bin/mpirun -np 1 -H `hostname` env | grep LD > > OLDPWD=/home/glebn/OpenMPI/ompi-tests/intel_tests > > > > if I provide any other name that resolves to the same IP then > > LD_LIBRARY_PATH is set. > > # /home/glebn/openmpi/bin/mpirun -np 1 -H localhost env | grep LD > > OLDPWD=/home/glebn/OpenMPI/ompi-tests/intel_tests > > LD_LIBRARY_PATH=/home/glebn/openmpi/lib > > > > Here is debug output of "bad" run: > > /home/glebn/openmpi/bin/mpirun -np 1 -H `hostname` -mca pls_rsh_debug 1 echo > > [elfit1:14730] pls:rsh: launching job 1 > > [elfit1:14730] pls:rsh: no new daemons to launch > > > > Here is good one: > > /home/glebn/openmpi/bin/mpirun -np 1 -H localhost -mca pls_rsh_debug 1 echo > > [elfit1:14752] pls:rsh: launching job 1 > > [elfit1:14752] pls:rsh: local csh: 0, local sh: 1 > > [elfit1:14752] pls:rsh: assuming same remote shell as local shell > > [elfit1:14752] pls:rsh: remote csh: 0, remote sh: 1 > > [elfit1:14752] pls:rsh: final template argv: > > [elfit1:14752] pls:rsh: /usr/bin/ssh <template> orted --name <template> > > --num_procs 1 --vpid_start 0 --nodename <template> --universe > > root@elfit1:default-universe-14752 --nsreplica > > "0.0.0;tcp://172.30.7.187:43017;tcp://192.168.7.187:43017" --gprreplica > > "0.0.0;tcp://172.30.7.187:43017;tcp://192.168.7.187:43017" -mca > > mca_base_param_file_path > > /home/glebn/openmpi//share/openmpi/amca-param-sets:/home/USERS/glebn/openmpiwd > > -mca mca_base_param_file_path_force /home/USERS/glebn/openmpiwd > > [elfit1:14752] pls:rsh: launching on node localhost > > [elfit1:14752] pls:rsh: localhost is a LOCAL node > > [elfit1:14752] pls:rsh: reset PATH: > > /home/glebn/openmpi/bin:/home/USERS/lenny/MPI/mpi/bin:/opt/vltmpi/OPENIB/mpi/b > > in:/usr/kerberos/sbin:/usr/kerberos/bin:/usr/local/sbin:/usr/local/bin:/sbin:/ > > bin:/usr/sbin:/usr/bin:/root/bin > > [elfit1:14752] pls:rsh: reset LD_LIBRARY_PATH: /home/glebn/openmpi/lib > > [elfit1:14752] pls:rsh: changing to directory /root > > [elfit1:14752] pls:rsh: executing: (/home/glebn/openmpi/bin/orted) [orted > > --name 0.0.1 --num_procs 1 --vpid_start 0 --nodename localhost --universe > > root@elfit1:default-universe-14752 --nsreplica > > "0.0.0;tcp://172.30.7.187:43017;tcp://192.168.7.187:43017" --gprreplica > > "0.0.0;tcp://172.30.7.187:43017;tcp://192.168.7.187:43017" -mca > > mca_base_param_file_path > > /home/glebn/openmpi//share/openmpi/amca-param-sets:/home/USERS/glebn/openmpiwd > > -mca mca_base_param_file_path_force /home/USERS/glebn/openmpiwd --set-sid] > > > > -- > > Gleb. > > _______________________________________________ > > devel mailing list > > de...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Gleb.