Damien, As Gilles indicated an example would be great. Meanwhile, as you already have access to the root cause with a debugger, can you check what branch of the if regarding the communicator type in the ompi_coll_base_retain_datatypes_w function is taken. What is the communicator type ? Intra or inter ? with or without topology ?
Thanks, George. On Wed, May 4, 2022 at 9:35 AM Gilles Gouaillardet via devel < devel@lists.open-mpi.org> wrote: > Damian, > > Thanks for the report! > > could you please trim your program and share it so I can have a look? > > > Cheers, > > Gilles > > > On Wed, May 4, 2022 at 10:27 PM Damian Marek via devel < > devel@lists.open-mpi.org> wrote: > >> Hello, >> >> I have been getting intermittent memory corruptions and segmentation >> faults while using Ialltoallw in OpenMPI v4.0.3. Valgrind also reports an >> invalid read in the "ompi_coll_base_retain_datatypes_w" function defined in >> "coll_base_util.c". >> >> Running with a debug build of ompi an assertion fails as well: >> >> base/coll_base_util.c:274: ompi_coll_base_retain_datatypes_w: Assertion >> `OPAL_OBJ_MAGIC_ID == ((opal_object_t *) (stypes[i]))->obj_magic_id' failed. >> >> I think it is related to the fact that I am using a communicator created >> with 2D MPI_Cart_create followed by getting 2 subcommunicators from >> MPI_Cart_sub, in some cases one of the dimensions is 1. In >> "ompi_coll_base_retain_datatypes_w" the neighbour count is used to find >> "rcount" and "scount" at line 267. In my bug case it returns 2 for both, >> but I believe it should be 1 since that is the comm size and the amount of >> memory I have allocated for sendtypes and recvtypes. Then, an invalid read >> happens at 274 and 280. >> >> Regards, >> Damian >> >> >> >> >> >> >>