You are correct - I misread the note. My bad.

I'll look at how we might ensure the LD_LIBRARY_PATH shows up correctly -
shouldn't be a big deal.


On 7/19/07 9:12 AM, "George Bosilca" <bosi...@cs.utk.edu> wrote:

> The second execution (the one that you make reference to) is the one
> that works fine. The failing one is the first one, where
> LD_LIBRARY_PATH is not provided. As Gleb indicate using localhost
> make the problem vanish.
> 
>    george.
> 
> On Jul 19, 2007, at 10:57 AM, Ralph H Castain wrote:
> 
>> But it *does* provide an LD_LIBRARY_PATH that is pointing to your
>> openmpi
>> installation - it says it did it right here in your debug output:
>> 
>>>>> [elfit1:14752] pls:rsh: reset LD_LIBRARY_PATH: /home/glebn/
>>>>> openmpi/lib
>> 
>> I suspect that the problem isn't in the launcher, but rather in the
>> iof
>> again. Why don't we wait until those fixes come into the trunk before
>> chasing our tails any further?
>> 
>> 
>> On 7/19/07 8:18 AM, "Gleb Natapov" <gl...@voltaire.com> wrote:
>> 
>>> On Thu, Jul 19, 2007 at 08:07:51AM -0600, Ralph H Castain wrote:
>>>> Interesting. Apparently, it is getting a NULL back when it tries
>>>> to access
>>>> the LD_LIBRARY_PATH in your environment. Here is the code involved:
>>>> 
>>>>      newenv = opal_os_path( false, prefix_dir, lib_base, NULL );
>>>>      oldenv = getenv("LD_LIBRARY_PATH");
>>>>      if (NULL != oldenv) {
>>>>           char* temp;
>>>>           asprintf(&temp, "%s:%s", newenv, oldenv);
>>>>           free(newenv);
>>>>           newenv = temp;
>>>>      }
>>>>      opal_setenv("LD_LIBRARY_PATH", newenv, true, &env);
>>>>      if (mca_pls_rsh_component.debug) {
>>>>           opal_output(0, "pls:rsh: reset LD_LIBRARY_PATH: %s",
>>>> newenv);
>>>>      }
>>>>      free(newenv);
>>>> 
>>>> So you can see that the only way we can get your debugging output
>>>> is for the
>>>> LD_LIBRARY_PATH in your starting environment to be NULL. Note
>>>> that this
>>>> comes after we fork, so we are talking about the child process -
>>>> not sure
>>>> that matters, but may as well point it out.
>>>> 
>>>> So the question is: why do you not have LD_LIBRARY_PATH set in your
>>>> environment when you provide a different hostname?
>>> Right I don't have LD_LIBRARY_PATH set in my environment, but I
>>> expect
>>> that mpirun will provide working environment for all ranks not just
>>> remote ones. This is how it worked before. Perhaps that was a bug,
>>> but
>>> this was useful bug :)
>>> 
>>>> 
>>>> 
>>>> On 7/19/07 7:45 AM, "Gleb Natapov" <gl...@voltaire.com> wrote:
>>>> 
>>>>> On Wed, Jul 18, 2007 at 09:08:38PM +0300, Gleb Natapov wrote:
>>>>>> On Wed, Jul 18, 2007 at 09:08:47AM -0600, Ralph H Castain wrote:
>>>>>>> But this will lockup:
>>>>>>> 
>>>>>>> pn1180961:~/openmpi/trunk rhc$ mpirun -n 1 -host pn1180961
>>>>>>> printenv | grep
>>>>>>> LD
>>>>>>> 
>>>>>>> The reason is that the hostname in this last command doesn't
>>>>>>> match the
>>>>>>> hostname I get when I query my interfaces, so mpirun thinks it
>>>>>>> must be a
>>>>>>> remote host - and so we stick in ssh until that times out.
>>>>>>> Which could be
>>>>>>> quick on your machine, but takes awhile for me.
>>>>>>> 
>>>>>> This is not my case. mpirun resolves hostname and runs env but
>>>>>> LD_LIBRARY_PATH is not there. If I use full name like this
>>>>>> # /home/glebn/openmpi/bin/mpirun -np 1 -H elfit1.voltaire.com
>>>>>> env | grep
>>>>>> LD_LIBRARY_PATH
>>>>>> LD_LIBRARY_PATH=/home/glebn/openmpi/lib
>>>>>> 
>>>>>> everything is OK.
>>>>>> 
>>>>> More info. If I provide hostname to mpirun as returned by command
>>>>> "hostname" the LD_LIBRARY_PATH is not set:
>>>>> # /home/glebn/openmpi/bin/mpirun -np 1 -H `hostname`  env | grep LD
>>>>> OLDPWD=/home/glebn/OpenMPI/ompi-tests/intel_tests
>>>>> 
>>>>> if I provide any other name that resolves to the same IP then
>>>>> LD_LIBRARY_PATH is set.
>>>>> # /home/glebn/openmpi/bin/mpirun -np 1 -H localhost  env | grep LD
>>>>> OLDPWD=/home/glebn/OpenMPI/ompi-tests/intel_tests
>>>>> LD_LIBRARY_PATH=/home/glebn/openmpi/lib
>>>>> 
>>>>> Here is debug output of "bad" run:
>>>>> /home/glebn/openmpi/bin/mpirun -np 1 -H `hostname` -mca
>>>>> pls_rsh_debug 1 echo
>>>>> [elfit1:14730] pls:rsh: launching job 1
>>>>> [elfit1:14730] pls:rsh: no new daemons to launch
>>>>> 
>>>>> Here is good one:
>>>>> /home/glebn/openmpi/bin/mpirun -np 1 -H localhost -mca
>>>>> pls_rsh_debug 1 echo
>>>>> [elfit1:14752] pls:rsh: launching job 1
>>>>> [elfit1:14752] pls:rsh: local csh: 0, local sh: 1
>>>>> [elfit1:14752] pls:rsh: assuming same remote shell as local shell
>>>>> [elfit1:14752] pls:rsh: remote csh: 0, remote sh: 1
>>>>> [elfit1:14752] pls:rsh: final template argv:
>>>>> [elfit1:14752] pls:rsh:     /usr/bin/ssh <template> orted --name
>>>>> <template>
>>>>> --num_procs 1 --vpid_start 0 --nodename <template> --universe
>>>>> root@elfit1:default-universe-14752 --nsreplica
>>>>> "0.0.0;tcp://172.30.7.187:43017;tcp://192.168.7.187:43017" --
>>>>> gprreplica
>>>>> "0.0.0;tcp://172.30.7.187:43017;tcp://192.168.7.187:43017" -mca
>>>>> mca_base_param_file_path
>>>>> /home/glebn/openmpi//share/openmpi/amca-param-sets:/home/USERS/
>>>>> glebn/openmpi
>>>>> wd
>>>>> -mca mca_base_param_file_path_force /home/USERS/glebn/openmpiwd
>>>>> [elfit1:14752] pls:rsh: launching on node localhost
>>>>> [elfit1:14752] pls:rsh: localhost is a LOCAL node
>>>>> [elfit1:14752] pls:rsh: reset PATH:
>>>>> /home/glebn/openmpi/bin:/home/USERS/lenny/MPI/mpi/bin:/opt/
>>>>> vltmpi/OPENIB/mpi
>>>>> /b
>>>>> in:/usr/kerberos/sbin:/usr/kerberos/bin:/usr/local/sbin:/usr/
>>>>> local/bin:/sbin
>>>>> :/
>>>>> bin:/usr/sbin:/usr/bin:/root/bin
>>>>> [elfit1:14752] pls:rsh: reset LD_LIBRARY_PATH: /home/glebn/
>>>>> openmpi/lib
>>>>> [elfit1:14752] pls:rsh: changing to directory /root
>>>>> [elfit1:14752] pls:rsh: executing: (/home/glebn/openmpi/bin/
>>>>> orted) [orted
>>>>> --name 0.0.1 --num_procs 1 --vpid_start 0 --nodename localhost --
>>>>> universe
>>>>> root@elfit1:default-universe-14752 --nsreplica
>>>>> "0.0.0;tcp://172.30.7.187:43017;tcp://192.168.7.187:43017" --
>>>>> gprreplica
>>>>> "0.0.0;tcp://172.30.7.187:43017;tcp://192.168.7.187:43017" -mca
>>>>> mca_base_param_file_path
>>>>> /home/glebn/openmpi//share/openmpi/amca-param-sets:/home/USERS/
>>>>> glebn/openmpi
>>>>> wd
>>>>> -mca mca_base_param_file_path_force /home/USERS/glebn/openmpiwd
>>>>> --set-sid]
>>>>> 
>>>>> --
>>>>> Gleb.
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> de...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>> 
>>>> 
>>>> _______________________________________________
>>>> devel mailing list
>>>> de...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> 
>>> --
>>> Gleb.
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> 
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


Reply via email to