Houston, we have a problem.

lib_mpif90.so had changes for the upcoming 1.4.4 release that requires a .so 
version bump.  Specifically, some MPI F90 bindings used to have some parameters 
of type INTEGER.  In 1.4.4, those parameter types were corrected to be 
INTEGER(KIND=MPI_ADDRESS_KIND).

 * 1.4.3 value: 0:1:0
 * 1.4.4 value: 1:0:0
   --> bumped current & reset rev because param types on some i/f's changed 

Unfortunately, libmpi_f90.so has already been released in v1.5 with the value 
1:0:0.  So... what do we do?  

Before discussing options, let's review a few things:

1. Remember that two different versions of OMPI cannot be installed into the 
same tree.  .so version numbers *help*, but there's still other support files 
that OMPI does not version.  Hence, if you have 2 versions of OMPI, you *must* 
install them to different installation trees.

2. If you compile your MPI application with OMPI version A, you can run it with 
OMPI version B (provided that both A and B are ABI-compatible with each other), 
usually by updating your LD_LIBRARY_PATH.

3. To be clear, you can do something like this:

$ /ompi-vA-install/bin/mpicc ring.c -o ring
$ export LD_LIBRARY_PATH=/ompi-b-install/lib
$ /ompi-vB-install/bin/mpirun -np 4 ring

4. However, if A and B are *not* ABI compatible, the .so version numbers are 
supposed to protect you such that the above example would not work.  When you 
try to mpirun, you would get an error message from the run-time linker that 
ring is not compatible with B's libmpi.so (for example).

5. The particular F90 changes that were made were only to the "large" F90 
module size, which is not the default (you have to specify 
--with-f90-module=large to OMPI's configure).

6. Versions of OMPI 1.3.2 are supposed to be ABI compatible with all remaining 
versions of 1.3.x and all versions of 1.4.x.

-----

So -- with all that in mind -- let's talk about what to do for 1.4.4.  I see a 
few options:

1. Go with 1:0:0 anyway.  

   CONSEQUENCE: We have two different versions of libmpi.so out there with 
1.0.0 which are not compatible with each other.

   IMPACT: Probably pretty minimal -- not too many people use the "large" F90 
bindings.  And no one has noticed the wrong bindings that we included <=1.4.3, 
so it's unlikely that anyone is using these particular interfaces.

2. Go with 0:2:0.

   CONSEQUENCE: This is somewhat of a lie; we're saying we haven't modified the 
interface.  But we did.

   IMPACT: Same as above.  A binary using the old/wrong interfaces (e.g., 
compiled against 1.4.3) could still run-time link against OMPI 1.4.4 and 
possibly segv because the parameters are different sizes.

3. Not fix the Fortran bindings in 1.4.x -- fix them in 1.5.4.

   CONSEQUENCE: Leave them broken.  There's at least one user who would be 
annoyed by this (i.e., the one who reported the problem to us).

   IMPACT: We can fix this in 1.5.4.  We already have many old versions of OMPI 
that have these broken bindings.  What's one more?  It might be an easier thing 
to say "The bindings are fixed in 1.5.4 and higher" rather than "The bindings 
are fixed in 1.4.x, where x>=4 and 1.5.y, where y>=4".

None of the options are good.

I'm somewhat leaning towards #3.  

Opinions?

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


Reply via email to