Damien,

As Gilles indicated an example would be great. Meanwhile, as you already
have access to the root cause with a debugger, can you check what branch of
the if regarding the communicator type in the
ompi_coll_base_retain_datatypes_w function is taken. What is the
communicator type ? Intra or inter ? with or without topology ?

Thanks,
  George.


On Wed, May 4, 2022 at 9:35 AM Gilles Gouaillardet via devel <
devel@lists.open-mpi.org> wrote:

> Damian,
>
> Thanks for the report!
>
> could you please trim your program and share it so I can have a look?
>
>
> Cheers,
>
> Gilles
>
>
> On Wed, May 4, 2022 at 10:27 PM Damian Marek via devel <
> devel@lists.open-mpi.org> wrote:
>
>> Hello,
>>
>> I have been getting intermittent memory corruptions and segmentation
>> faults while using Ialltoallw in OpenMPI v4.0.3. Valgrind also reports an
>> invalid read in the "ompi_coll_base_retain_datatypes_w" function defined in
>> "coll_base_util.c".
>>
>> Running with a debug build of ompi an assertion fails as well:
>>
>> base/coll_base_util.c:274: ompi_coll_base_retain_datatypes_w: Assertion
>> `OPAL_OBJ_MAGIC_ID == ((opal_object_t *) (stypes[i]))->obj_magic_id' failed.
>>
>> I think it is related to the fact that I am using a communicator created
>> with 2D MPI_Cart_create followed by getting 2 subcommunicators from
>> MPI_Cart_sub, in some cases one of the dimensions is 1. In
>> "ompi_coll_base_retain_datatypes_w" the neighbour count is used to find
>> "rcount" and "scount" at line 267. In my bug case it returns 2 for both,
>> but I believe it should be 1 since that is the comm size and the amount of
>> memory I have allocated for sendtypes and recvtypes. Then, an invalid read
>> happens at 274 and 280.
>>
>> Regards,
>> Damian
>>
>>
>>
>>
>>
>>
>>

Reply via email to