Hello, I have been getting intermittent memory corruptions and segmentation faults while using Ialltoallw in OpenMPI v4.0.3. Valgrind also reports an invalid read in the "ompi_coll_base_retain_datatypes_w" function defined in "coll_base_util.c".
Running with a debug build of ompi an assertion fails as well: base/coll_base_util.c:274: ompi_coll_base_retain_datatypes_w: Assertion `OPAL_OBJ_MAGIC_ID == ((opal_object_t *) (stypes[i]))->obj_magic_id' failed. I think it is related to the fact that I am using a communicator created with 2D MPI_Cart_create followed by getting 2 subcommunicators from MPI_Cart_sub, in some cases one of the dimensions is 1. In "ompi_coll_base_retain_datatypes_w" the neighbour count is used to find "rcount" and "scount" at line 267. In my bug case it returns 2 for both, but I believe it should be 1 since that is the comm size and the amount of memory I have allocated for sendtypes and recvtypes. Then, an invalid read happens at 274 and 280. Regards, Damian
<<attachment: mpi-info.zip>>