On Apr 24, 2009, at 8:08 AM, Number Cruncher wrote:

Compiler (g++ 4.1.2) output when using optimization (-O2):
/opt/cfs/include/openmpi/ompi/mpi/cxx/datatype.h: In constructor
‘MPI::Datatype::Datatype()’:
/opt/cfs/include/openmpi/ompi/mpi/cxx/datatype.h:68: warning:
type-punning to incomplete type might break strict-aliasing rules


These lines are mostly of the form:

  inline Datatype() : mpi_datatype(MPI_DATATYPE_NULL) { }

Assumedly the problem is with MPI_DATATYPE_NULL (i.e., MPI_*_NULL for the other types). More below.

Without the C++ bindings, I still get:
helloMPI.cpp: In function ‘int main(int, char**)’:
helloMPI.cpp:8: warning: type-punning to incomplete type might break
strict-aliasing rules


The odd thing is that these warnings only seem to appear in gcc 4.1 output. They do not appear in gcc 3.4.6 and 4.4.0 (I don't know if they appear in intermediate versions or not).

I've followed the discussion at ompi/communicator/communicator.h and
looked at the major changeset at
https://svn.open-mpi.org/trac/ompi/changeset/20627 .


This was the changeset where we did the ABI fixes -- ensuring that if you compile/link against Open MPI vA.B.C, you will be able to just change your LD_LIBRARY_PATH or replace libmpi.so when upgrading to Open MPI vA.B.x.

The issue here is surprisingly complex; it took us a long time to come up with the current solution.

For the purpose of this discussion, assume OMPI vA.B.C is built with shared libraries (libmpi.so and friends), and a user application (hello.c) compiles and links against them.

As you noted, OMPI's handles are all pointers -- e.g., MPI_Comm is (ompi_communicator_t*). We purposefully do not include the struct definition of ompi_communicator_t in mpi.h because it's private -- we don't want users to go poking around in there. But -- to make a long story short -- the linker will embed the length of the ompi_communicator_t in the hello executable. Among other reasons, it's so that pointer math can be computed properly. However, let's assume the user upgrades to OMPI vA.B.(C+1) in place, meaning that libmpi.so is replaced with a new version. If we've changed the back- end ompi_communicator_t struct (e.g., added a member), then the size that is included in the hello executable no longer matches the size that is in libmpi.so -- and Bad Things can (and do) happen.

Hence, the solution of r20627: make a "dummy" type that is guaranteed to be larger than ompi_communicator_t -- ompi_predefined_communicator_t. It's actually a struct that *contains* ompi_communicator_t and then a fixed amount of padding at the end. Since MPI_COMM_WORLD will always be this ompi_predefined_communicator_t, we can ensure that its size stays constant, even if the size of ompi_communicator_t changes. Hence, the size in the hello executable and libmpi.so will be the same. Happiness.

Additionally, casing the MPI_COMM_WORLD instance (in the back end: a variable instance named ompi_mpi_comm_world) from ompi_predefined_communicator_t to ompi_communicator_t is valid, because the first bytes in the struct *are* the same type -- a poor man's C++ inheritance system, per se (or simply using the correct _predefined struct member).

Note that as you can probably guess from the struct names, this *only* affects the MPI pre-defined handles (e.g., MPI_COMM_WORLD). It does not affect user-defined communicators. Hence, pre-defined handles take up some extra padding space at the end, but user-defined communicators do not.

That's the rationale for why we did what we did.

The problem is that, with optimization enabled, the compiler can assume
that "an object of one type is assumed never to reside at the same
address as an object of a different type, unless the types are almost
the same" (gcc info page). In this case ompi_predefined_*_t are not
fully defined by the time the C++ compiler expands the macro
MPI_COMM_WORLD to:
((ompi_communicator_t *)&(ompi_mpi_comm_world))


Per above, this was intentional.  Hope my explanations made sense.

The compiler complains because ompi_mpi_comm_world is declared as an
"extern struct ompi_predefined_communicator_t" but the type is
incomplete, so it can't tell whether the cast is a permissible
almost-the-same type pun (e.g. an "int" can alias an "unsigned").

I think this is potentially a serious performance issue for anyone using OpenMPI in a C++ environment, and the profuse warnings preclude it's use
in our build system.


I agree that the warnings are Bad.

The question is -- why do they only show up in gcc 4.1? More specifically -- why do they *not* show up in other versions of gcc? Is it a gcc 4.1 compiler bug?

The bad news is that the only work around I have is to insert (void *)
casts between (MPI_TYPENAME) and the address operator, e.g.:
#define MPI_COMM_WORLD (((MPI_Comm)(void *)&(ompi_mpi_comm_world)))


Hmm.  I guess that's plausible, but ugly.

An alternative might be to make the full type definition available by
#including some of the internal developer headers such as
ompi/communicator/communicator.h



I'm not sure what the right solution is, but that is not attractive. :-)

--
Jeff Squyres
Cisco Systems


Reply via email to