Hi,

I encountered an assertion failure in Open MPI trunk and found a bug.

See the attached program. This program can be run with mpiexec -n 1.
This program calls MPI_Put and writes one int value to the target side.
The target side datatype is equivalent to MPI_INT, but is a derived
datatype created by MPI_Type_contiguous and MPI_Type_Dup.

This program aborts with the following output.

==========================================================================
#### dt1 (0x2626160) ####
type 2 count ints 1 count disp 0 count datatype 1
ints:     1 
types:    MPI_INT 
#### dt2 (0x2626340) ####
type 1 count ints 0 count disp 0 count datatype 1
types:    0x2626160 
put_dup_type: ../../../ompi/datatype/ompi_datatype_args.c:565: 
__ompi_datatype_create_from_packed_description: Assertion `data_id < 45' failed.
[ppc:05244] *** Process received signal ***
[ppc:05244] Signal: Aborted (6)
[ppc:05244] Signal code:  (-6)
[ppc:05244] [ 0] /lib/libpthread.so.0(+0xeff0) [0x7fe58a275ff0]
[ppc:05244] [ 1] /lib/libc.so.6(gsignal+0x35) [0x7fe589f371b5]
[ppc:05244] [ 2] /lib/libc.so.6(abort+0x180) [0x7fe589f39fc0]
[ppc:05244] [ 3] /lib/libc.so.6(__assert_fail+0xf1) [0x7fe589f30301]
[ppc:05244] [ 4] /ompi/lib/libmpi.so.0(+0x6504e) [0x7fe58a4e804e]
[ppc:05244] [ 5] 
/ompi/lib/libmpi.so.0(ompi_datatype_create_from_packed_description+0x23) 
[0x7fe58a4e8cf6]
[ppc:05244] [ 6] /ompi/lib/openmpi/mca_osc_rdma.so(+0xd04b) [0x7fe5839a104b]
[ppc:05244] [ 7] 
/ompi/lib/openmpi/mca_osc_rdma.so(ompi_osc_rdma_sendreq_recv_put+0xa8) 
[0x7fe5839a3ae5]
[ppc:05244] [ 8] /ompi/lib/openmpi/mca_osc_rdma.so(+0x86cc) [0x7fe58399c6cc]
[ppc:05244] [ 9] /ompi/lib/openmpi/mca_btl_self.so(mca_btl_self_send+0x87) 
[0x7fe58510bb04]
[ppc:05244] [10] /ompi/lib/openmpi/mca_osc_rdma.so(+0xc44b) [0x7fe5839a044b]
[ppc:05244] [11] /ompi/lib/openmpi/mca_osc_rdma.so(+0xd69d) [0x7fe5839a169d]
[ppc:05244] [12] /ompi/lib/openmpi/mca_osc_rdma.so(ompi_osc_rdma_flush+0x50) 
[0x7fe5839a1776]
[ppc:05244] [13] 
/ompi/lib/openmpi/mca_osc_rdma.so(ompi_osc_rdma_module_fence+0x8e6) 
[0x7fe5839a84ab]
[ppc:05244] [14] /ompi/lib/libmpi.so.0(MPI_Win_fence+0x16f) [0x7fe58a54127d]
[ppc:05244] [15] ompi-trunk/put_dup_type() [0x400d10]
[ppc:05244] [16] /lib/libc.so.6(__libc_start_main+0xfd) [0x7fe589f23c8d]
[ppc:05244] [17] put_dup_type() [0x400b09]
[ppc:05244] *** End of error message ***
--------------------------------------------------------------------------
mpiexec noticed that process rank 0 with PID 5244 on node ppc exited on signal 
6 (Aborted).
--------------------------------------------------------------------------
==========================================================================

__ompi_datatype_create_from_packed_description function, in which the
assertion failure occurred, seems to expect the value of data_id is an
ID of a predefined datatype. In my environment, the value of data_id
is 68, that is an ID of the datatype created by MPI_Type_contiguous.

On one-sided communication, the target side datatype is encoded as
'description' at the origin side and then it is decoded at the target
side. I think there are problems in both encoding stage and decoding
stage.

__ompi_datatype_pack_description function in
ompi/datatype/ompi_datatype_args.c file encodes the datatype.
For MPI_COMBINER_DUP on line 451, it encodes only create_type and id
and returns immediately. It doesn't encode the information of the base
dataype (in my case, the datatype created by MPI_Type_contiguous).

__ompi_datatype_create_from_packed_description function in
ompi/datatype/ompi_datatype_args.c file decodes the description.
For MPI_COMBINER_DUP in line 557, it expects the value of data_id is
an ID of a predefined datatype. It is not always true.

I cannot fix this problem yet because I'm not familiar with the datatype
code in Open MPI. MPI_COMBINER_DUP is also used for predefined datatypes
and the calculation of total_pack_size is also involved. It seems not
so simple.

Regards,
KAWASHIMA Takahiro
#include <stdint.h>
#include <stdio.h>
#include <mpi.h>

#define PRINT_ARGS

#ifdef PRINT_ARGS
/* defined in ompi/datatype/ompi_datatype_args.c */
extern int32_t ompi_datatype_print_args(const struct ompi_datatype_t *pData);
#endif

int main(int argc, char *argv[])
{
    MPI_Win win;
    MPI_Datatype dt1, dt2;
    int obuf[1], tbuf[1];

    obuf[0] = 77;
    tbuf[0] = 88;

    MPI_Init(&argc, &argv);

    MPI_Type_contiguous(1, MPI_INT, &dt1);
    MPI_Type_dup(dt1, &dt2);
    MPI_Type_commit(&dt2);

#ifdef PRINT_ARGS
    printf("#### dt1 (%p) ####\n", (void *)dt1);
    ompi_datatype_print_args(dt1);
    printf("#### dt2 (%p) ####\n", (void *)dt2);
    ompi_datatype_print_args(dt2);
    fflush(stdout);
#endif

    MPI_Win_create(tbuf, sizeof(int), 1, MPI_INFO_NULL, MPI_COMM_SELF, &win);
    MPI_Win_fence(0, win);
    MPI_Put(obuf, 1, MPI_INT, 0, 0, 1, dt2, win);
    MPI_Win_fence(0, win);

    MPI_Type_free(&dt1);
    MPI_Type_free(&dt2);
    MPI_Win_free(&win);
    MPI_Finalize();

    if (tbuf[0] == 77) {
        printf("OK\n");
    } else {
        printf("NG\n");
    }
    fflush(stdout);

    return 0;
}

Reply via email to