Chris - When we look at ABI stability for Open MPI releases, we look only at the MPI and SHMEM interfaces, not the internal interfaces used by Open MPI internally. libopen-pal.so is an internal library, and we do not guarantee ABI stability across minor releases. In 3.0.3, there was a backwards incompatible change in libopen-pal.so, which is why the shared library version numbers were increased in a way that prevented loading a new version of libopen-pal.so when the application was linked against an earlier version of the library.
In practice, this should not be a problem. The wrapper compilers (and our instructions for linking when not using the wrapper compilers) only link against libmpi.so (or a set of libraries if using Fortran), as libmpi.so contains the public interface. libmpi.so has a dependency on libopen-pal.so, so the loader will load the version of libopen-pal.so that matches the version of Open MPI used to build libmpi.so. However, if someone explicitly links against libopen-pal.so, you end up where we are today. There’s probably a bug in HDF5’s mechanism for linking against Open MPI, since it pulled in a dependency on libopen-pal.so. However, there may be some things we can do in the future to better handle this scenario. Unfortunately, most of the Open MPI developers (myself included) are at the SC’18 conference this week, so it will take us some time to investigate further. Brian > On Nov 14, 2018, at 5:20 AM, Christopher Samuel <csam...@swin.edu.au> wrote: > > Hi folks, > > Just resub'd after a long time to ask a question about binary/backwards > compatibility. > > We got bitten when upgrading from 3.0.0 to 3.0.3 which we assumed would be > binary compatible and so (after some testing to confirm it was) replaced our > existing 3.0.0 install with the 3.0.3 one (because we're using hierarchical > namespaces in Lmod it meant we avoided needed to recompile everything we'd > already built over the last 12 months with 3.0.0). > > However, once we'd done that we heard from a user that their code would no > longer run because it couldn't find libopen-pal.so.40 and saw that instead > 3.0.3 had libopen-pal.so.42. > > Initially we thought this was some odd build system problem, but then on > digging further we realised that they were linking against libraries that in > turn were built against OpenMPI (HDF5) and that those had embedded the > libopen-pal.so.40 names. > > Of course our testing hadn't found that because we weren't linking against > anything like those for our MPI tests. :-( > > But I was really surprised to see that these version numbers were changing, I > thought the idea was to keep things backwardly compatible within these series? > > Now fortunately our reason for doing the forced upgrade (we found our 3.0.0 > didn't work with our upgrade to Slurm 18.08.3) was us missing one combination > out of our testing whilst fault-finding and having gotten it going we've been > able to drop back to the original 3.0.0 & fixed it for them. > > But is this something that you folks have come across before? > > All the best, > Chris > -- > Christopher Samuel OzGrav Senior Data Science Support > ARC Centre of Excellence for Gravitational Wave Discovery > http://www.ozgrav.org/ http://twitter.com/ozgrav > > > > _______________________________________________ > devel mailing list > devel@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/devel _______________________________________________ devel mailing list devel@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/devel