FYI -- there is a complex issue about shared library versioning and
binary compatibility that we have punted on for v1.3.4. Hopefully
we'll think of a proper solution for v1.4.
If you care about such things, please read #2092.
Begin forwarded message:
From: "Open MPI" <b...@open-mpi.org>
Date: November 4, 2009 3:44:06 PM EST
Cc: <b...@osl.iu.edu>
Subject: [Open MPI] #2092: libopen-rte and libopen-pal shared
library versioning issues
#2092: libopen-rte and libopen-pal shared library versioning issues
---------------------
+------------------------------------------------------
Reporter: jsquyres | Owner:
Type: defect | Status: new
Priority: critical | Milestone: Open MPI 1.4
Version: trunk | Keywords:
---------------------
+------------------------------------------------------
mpicc currently links all of OMPI's libraries:
{{{
-lmpi -lopen-rte -lopen-pal
}}}
(similar for the other wrappers) When linking against shared
libraries,
this is both unnecessary and Bad -- the MPI application ends up
''explicitly'' depending on libopen-rte and libopen-pal rather than
''implicitly'' depending on them. The difference is that with
explicit
dependencies, the MPI app is then chained to the .so version
numbers of
libopen-rte and libopen-pal -- even though MPI apps don't
explicitly call
anything down in those libraries.
(see [wiki:ReleaseProcedures the Libtool .so version rules] before
reading
further)
This can be problematic -- consider:
* OMPI version A: has libmpi 0:0:0, libopen-rte 0:0:0, libopen-pal
0:0:0
* OMPI version B: has libmpi 0:1:0, libopen-rte 1:0:0, libopen-pal
1:0:0
An MPI app compiled against OMPI vA ''should'' be forward
compatible with
OMPI vB because the MPI interfaces haven't changed. But since the
MPI app
is explicitly dependent on libopen-rte and libopen-pal, it
''won't'' be
binary compatible (even though the MPI app doesn't call anything
down in
libopen-rte or libopen-pal -- only libmpi does, and libmpi
presumably has
been adjusted for any ORTE/OPAL interface changes). This is Bad.
Unfortunately, listing -lopen-rte and -lopen-pal in the wrappers is
necessary because of the case of static linking -- where all the
libs are
.a's, and therefore need to be explicitly mentioned.
So -- how to fix this? We kicked around a few ideas, but none of
them are
good. Recording them here for posterity:
1. Collapse libopen-rte and libopen-pal into a single libmpi. We
don't
like this because:
* We like 3 libs because it prevents developers from making
abstraction
violations.
* Other projects are now depending on libopen-rte and libopen-pal.
1. Only collapse libopen-rte/libopen-pal -> libmpi in production
builds;
keep the 3 libs for developer builds.
* This seems confusing, and still has the problem that other
projects
depend on these libraries.
1. We could figure out in configure whether we're building static or
dynamic in configure and adjust Makefile.am-isms to build one big
libmpi
for static and 3 libs for dynamic -- and then just have the wrappers
always only -lmpi (not -lopen-rte, etc.).
* But what to do when users --enable-static --enable-shared?
1. We could only allow building static ''or'' shared -- not both
simultaneously.
* This might annoy some people...?
1. We could add logic to the wrappers to look at the libraries in
$libdir
and figure out whether to list just -lmpi or also -lopen-rte, etc.
* The wrapper would have to know what the shared library
extension(s)
are for that platform (and they vary). This is possible, but icky.
* The wrapper then has to parse the compiler and linker flags
passed
via argv to see if static or dynamic linking is being forced.
These flags
vary wildly on different platforms and different compilers. It
seems like
the only winning move here is not to play.
1. We could leave the libopen-rte and libopen-pal .so version
numbers as
0:0:0 and avoid the issue.
* We're doing this to get v1.3.4 out the door.
* But we really should figure out something "better" for v1.4 --
because we're doing a disservice to projects using these libraries.
'''NOTE:''' This issue potentially has ramifications about binary
compatibility of MPI applications in the v1.3 and v1.4 series with
the
upcoming v1.5 series. Meaning that if we ''do'' properly version
libopen-
rte/pal in v1.5, apps linked against rte/pal .so libs from the v1.3/
v1.4
series may have incompatible "current" and "age" values.
--
Ticket URL: <https://svn.open-mpi.org/trac/ompi/ticket/2092>
Open MPI <http://www.open-mpi.org/>
--
Jeff Squyres
jsquy...@cisco.com