On Feb 17, 2010, at 3:05 PM, Ralf Wildenhues wrote:

> > The issue is that if the user has to specify -static to their linker,
> > they *also* have to specify --ompi:static, or Bad Things will happen.
> > Or, if they don't specify -static but *only* specify --ompi:static,
> > Bad Things will happen.  In short: it seems like adding yet another
> > wrapper-compiler-specific flag to the MPI ecosystem will cause
> > confusion, fear, and possibly the death of some cats.
> 
> Do you care for omitting -lopen-pal and -lorte only for capable Linux
> systems?  With new-enough binutils, you should be able to use
> -Wl,--as-needed -Wl,--no-as-needed around these two libs.

Mmmm.  Good point.  But I don't think it helps us on Solaris or OS X, does it?  
(maybe it does on OS X?)  Or do all linkers have some kind of option like this? 
 (this *might* be a way out, but I would probably need to be convinced :-) )

> I'm not entirely sure I understand your argumentation for why libmpi
> from 1.5.x has to be binary incompatible, but I haven't fully thought
> through this yet.

The context for this issue is so long that much was left out of my mail.  
Here's this particular issue in a nutshell:

- Open MPI v1.4.1 has libmpi at 0:1:0 and libopen-rte and libopen-pal both at 
0:0:0.
- Open MPI v1.4.1 links MPI apps against -lmpi -lopen-rte -lopen-pal.
- If we start .so versioning properly in v1.5, it's likely that libopen-rte and 
libopen-pal will both be 1:0:0.
  --> Note that these are both internal libraries; there are no symbols in 
these libraries that are used in the MPI applications.
- Open MPI v1.5 libmpi *could* be 1:0:1.
- Hence, an a.out created for OMPI v1.4.1 would work fine with v1.5 libmpi.
- But that a.out would not work with v1.5 libopen-rte and libopen-pal.

The problem is that our internal APIs change not infrequently, and potentially 
in incompatible ways.  This shouldn't (doesn't) matter to MPI applications, but 
because we "-lmpi -lopen-rte -lopen-pal" even for shared library linking, the 
linker thinks that it *does* matter because we've established an explicit 
dependency from a.out to all 3 libraries.

My initial idea was to add special flags to the wrapper compilers that the user 
would use to indicate whether it should be "-lmpi" (shared link) or "-lmpi 
-lopen-rte -lopen-pal" (static link).  Brian hates this.  :-)

Brian's idea is to make libmpi.la slurp up libopen-rte.la as a convenience 
library.  Similarly, have libopen-rte.la slurp up libopen-pal.la as a 
convenience library.  Hence, only -lmpi is needed regardless of whether you're 
linking statically or dynamically.

Regardless of which way we go, if we start .so versioning libopen-rte and 
libopen-pal in v1.5, ABI will break between v1.4 and v1.5.  We *do* need to fix 
the .so versioning issues of libopen-rte and libopen-pal; if we don't do it for 
v1.5.0, our next opportunity will be to do it in v1.7 (which is quite a long 
time off) because I refuse to do this size of a change in the middle of a 
release series.  All we'll have done is put off the pain until later.

Hopefully, that made sense.  :-)

-- 
Jeff Squyres
jsquy...@cisco.com

For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


Reply via email to