Cc'd Aleksej as I'm not sure he's on the "devel" list, and Mark Davies, as he is certainly not.
I'll also post this back onto the R HPC SIG list which is where I came in. Jeff Squyres wrote: > Now, all this being said, IIRC (and I very well may not!), the real > underlying issue here is that R is dlopening libmpi.so, which, in turn, is > dlopening its own DSOs. Given the global linker scoping issues, OMPI's > DSOs are unable to find the symbols they need to resolve in the process > (because libmpi.so's was opened in a private scope). > > This probably is unfortunately larger than us (Open MPI) -- it's really a > POSIX issue. What would be ideal is if different linker namespaces could > be something more fine-grained than "global" or "private" within a > process. E.g., if the private namespace of libmpi.so in the process could > selectively make its symbol namespace available to the DSOs that it > dlopens. Right now, the only option libmpi.so has is to be opened > with a public scope, which somewhat defeats the point of private > scoping. > Tying in with the suggestions you make above, there would seem to be a work-around fix for this, in the case of the Rmpi package on NetBSD anyway. Furthermore, the fix does not require any alterations to OpenMPI. Apparently, there has been a similar issue, symbol visibility when chaining shared library loading, within PAM on NetBSD. Mark Davies has now determined a way to force the Rmpi package to load libmpi.so, ahead of loading the Rmpi shared library itself, so that what appear to be the missing symbols are then available, for any future loads of the OpenMPI component libraries. On the version of Rmpi that I have been using, 0.5-8, the "fix" can be effected by the following, one, line, patch --- Rmpi/R/zzz.R 2009-02-04 05:27:08.000000000 +1300 +++ Rmpi.local/R/zzz.R 2010-05-17 14:25:27.000000000 +1200 @@ -7,6 +7,7 @@ # cat(vertxt) # Check if lam-mpi is running + dyn.load("/usr/pkg/lib/libmpi.so", local=FALSE) library.dynam("Rmpi", pkg, lib) if (!TRUE) stop("Fail to load Rmpi dynamic library.") Note that this currently hard codes the path to the libmpi.so, which for our system is in the standard NetBSD PkgSrc location, though there are probably "nicer" ways to achieve the same end, and greater flexibility, using R internals. Having said that, this "fix" does not seem to be needed on plaforms that have a global scope for shared library symbols, so maybe attempts to make it generic may be pointless. Thanks for everyone's time on this issue. I'll certainly be watching attempts to resolve the "larger than us (Open MPI)" issue, Kevin -- Kevin M. Buckley Room: CO327 School of Engineering and Phone: +64 4 463 5971 Computer Science Victoria University of Wellington New Zealand