FYI -- there is a complex issue about shared library versioning and binary compatibility that we have punted on for v1.3.4. Hopefully we'll think of a proper solution for v1.4.

If you care about such things, please read #2092.


Begin forwarded message:

From: "Open MPI" <b...@open-mpi.org>
Date: November 4, 2009 3:44:06 PM EST
Cc: <b...@osl.iu.edu>
Subject: [Open MPI] #2092: libopen-rte and libopen-pal shared library versioning issues

#2092: libopen-rte and libopen-pal shared library versioning issues
--------------------- +------------------------------------------------------
Reporter:  jsquyres  |       Owner:
    Type:  defect    |      Status:  new
Priority:  critical  |   Milestone:  Open MPI 1.4
 Version:  trunk     |    Keywords:
--------------------- +------------------------------------------------------
 mpicc currently links all of OMPI's libraries:

 {{{
 -lmpi -lopen-rte -lopen-pal
 }}}

(similar for the other wrappers) When linking against shared libraries,
 this is both unnecessary and Bad -- the MPI application ends up
 ''explicitly'' depending on libopen-rte and libopen-pal rather than
''implicitly'' depending on them. The difference is that with explicit dependencies, the MPI app is then chained to the .so version numbers of libopen-rte and libopen-pal -- even though MPI apps don't explicitly call
 anything down in those libraries.

(see [wiki:ReleaseProcedures the Libtool .so version rules] before reading
 further)

 This can be problematic -- consider:

* OMPI version A: has libmpi 0:0:0, libopen-rte 0:0:0, libopen-pal 0:0:0 * OMPI version B: has libmpi 0:1:0, libopen-rte 1:0:0, libopen-pal 1:0:0

An MPI app compiled against OMPI vA ''should'' be forward compatible with OMPI vB because the MPI interfaces haven't changed. But since the MPI app is explicitly dependent on libopen-rte and libopen-pal, it ''won't'' be binary compatible (even though the MPI app doesn't call anything down in libopen-rte or libopen-pal -- only libmpi does, and libmpi presumably has
 been adjusted for any ORTE/OPAL interface changes).  This is Bad.

 Unfortunately, listing -lopen-rte and -lopen-pal in the wrappers is
necessary because of the case of static linking -- where all the libs are
 .a's, and therefore need to be explicitly mentioned.

So -- how to fix this? We kicked around a few ideas, but none of them are
 good.  Recording them here for posterity:

1. Collapse libopen-rte and libopen-pal into a single libmpi. We don't
 like this because:
* We like 3 libs because it prevents developers from making abstraction
 violations.
    * Other projects are now depending on libopen-rte and libopen-pal.
1. Only collapse libopen-rte/libopen-pal -> libmpi in production builds;
 keep the 3 libs for developer builds.
* This seems confusing, and still has the problem that other projects
 depend on these libraries.
  1. We could figure out in configure whether we're building static or
dynamic in configure and adjust Makefile.am-isms to build one big libmpi
 for static and 3 libs for dynamic -- and then just have the wrappers
 always only -lmpi (not -lopen-rte, etc.).
    * But what to do when users --enable-static --enable-shared?
  1. We could only allow building static ''or'' shared -- not both
 simultaneously.
    * This might annoy some people...?
1. We could add logic to the wrappers to look at the libraries in $libdir
 and figure out whether to list just -lmpi or also -lopen-rte, etc.
* The wrapper would have to know what the shared library extension(s)
 are for that platform (and they vary).  This is possible, but icky.
* The wrapper then has to parse the compiler and linker flags passed via argv to see if static or dynamic linking is being forced. These flags vary wildly on different platforms and different compilers. It seems like
 the only winning move here is not to play.
1. We could leave the libopen-rte and libopen-pal .so version numbers as
 0:0:0 and avoid the issue.
    * We're doing this to get v1.3.4 out the door.
    * But we really should figure out something "better" for v1.4 --
 because we're doing a disservice to projects using these libraries.

 '''NOTE:''' This issue potentially has ramifications about binary
compatibility of MPI applications in the v1.3 and v1.4 series with the upcoming v1.5 series. Meaning that if we ''do'' properly version libopen- rte/pal in v1.5, apps linked against rte/pal .so libs from the v1.3/ v1.4
 series may have incompatible "current" and "age" values.

--
Ticket URL: <https://svn.open-mpi.org/trac/ompi/ticket/2092>
Open MPI <http://www.open-mpi.org/>




--
Jeff Squyres
jsquy...@cisco.com

Reply via email to