Sorry for the delay in replying.

I think that the issue here is the well-known libltdl "reporting the wrong 
error message" issue.  

Specifically, sometimes libltdl fails to load a DSO for a good reason, but then 
libltdl fails to report the right reason as to why it failed to load the DSO.  
Open MPI uses the function ld_dlerror() to get a printable string reason for 
why a DSO fails to load.  But sometimes that string reason is *wrong* (i.e., 
the DSO didn't load, but the reason OMPI printed out as to *why* it didn't load 
is incorrect).  And therefore what OMPI prints out is misleading, at best.

Over time, we have tried two things to make this error message better:

1. When we detect the "wrong" error message (i.e., if lt_dlerror() returns 
"file not found"), we actually use stat() to check for the presence of the file 
we were trying to open.  If we find the file, then we don't print the 
lt_dlerror(), but instead print the message you see:

[europa.ecs.vuw.ac.nz:09687] mca: base: component_find: unable to open
/usr/pkg/lib/openmpi/mca_carto_auto_detect: perhaps a missing symbol, or
compiled for a different version of Open MPI? (ignored)

So the error message is at least *somewhat* better than a totally misleading 
"file not found" message -- but it still only speculates on the real reason 
that libltdl failed to load the DSO.

2. https://svn.open-mpi.org/trac/ompi/changeset/22806 put in an OMPI-specific 
change to libltdl that avoids the incorrect error message altogether.  So now 
OMPI should print out the *real* reason libltdl failed to load the DSO.

It does not look like this patch made it over into the v1.4 series; it is 
awaiting review before it moves to the v1.5 branch 
(https://svn.open-mpi.org/trac/ompi/ticket/2337).  

Hope that all made sense!

-----

Now, all this being said, IIRC (and I very well may not!), the real underlying 
issue here is that R is dlopening libmpi.so, which, in turn, is dlopening its 
own DSOs.  Given the global linker scoping issues, OMPI's DSOs are unable to 
find the symbols they need to resolve in the process (because libmpi.so's was 
opened in a private scope).

This probably is unfortunately larger than us (Open MPI) -- it's really a POSIX 
issue.  What would be ideal is if different linker namespaces could be 
something more fine-grained than "global" or "private" within a process.  E.g., 
if the private namespace of libmpi.so in the process could selectively make its 
symbol namespace available to the DSOs that it dlopens.  Right now, the only 
option libmpi.so has is to be opened with a public scope, which somewhat 
defeats the point of private scoping.

Have you tried building Open MPI with the --disable-dlopen configure flag?  
This will slurp all of OMPI's DSOs up into libmpi.so -- so there's no dlopening 
at run-time.  Hence, your app (R) can dlopen libmpi.so, but then libmpi.so 
doesn't dlopen anything else -- all of OMPI's plugins are physically located in 
libmpi.so.




On May 11, 2010, at 8:33 PM, <kevin.buck...@ecs.vuw.ac.nz> 
<kevin.buck...@ecs.vuw.ac.nz> wrote:

> 
> > Which libltdl version is that NetBSD ltdl.h from?  Which version is
> > in opal/libltdl?  Have you tried not doing the above change?
> >
> > libltdl 2.2.x has incompatible changes over 1.5.x, both in the library
> > as well as in the header, as well as (I think) in preloaded modules.
> 
> Hey Ralf,
> 
> The libtool distinfo file implies NetBSD currently uses libtool-2.2.6b.
> 
> An ldd of mpirun shows  -lltdl.7 => /usr/pkg/lib/libltdl.so.7
> 
> 
> I do need to attempt a build of 1.4.2 here in ECS, so I'll try
> building without the patches but I seem to recall that if those
> libtool-related patches
> 
> opal/Makefile.in
> configure
> opal/mca/base/mca_base_component_find.c
> opal/mca/base/mca_base_component_repository.c
> test/support/components.h
> test/support/components.c
> 
> were not applied, it did not even build. But we'll see.
> 
> 
> And if you are reading this, Alexsej, have you,as the real
> "OpenMPI on NetBSD" man, built a 1.4.2 as yet ?
> 
> Kevin
> 
> --
> Kevin M. Buckley                                  Room:  CO327
> School of Engineering and                         Phone: +64 4 463 5971
>  Computer Science
> Victoria University of Wellington
> New Zealand
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


Reply via email to