Hi, I encountered an assertion failure in Open MPI trunk and found a bug.
See the attached program. This program can be run with mpiexec -n 1. This program calls MPI_Put and writes one int value to the target side. The target side datatype is equivalent to MPI_INT, but is a derived datatype created by MPI_Type_contiguous and MPI_Type_Dup. This program aborts with the following output. ========================================================================== #### dt1 (0x2626160) #### type 2 count ints 1 count disp 0 count datatype 1 ints: 1 types: MPI_INT #### dt2 (0x2626340) #### type 1 count ints 0 count disp 0 count datatype 1 types: 0x2626160 put_dup_type: ../../../ompi/datatype/ompi_datatype_args.c:565: __ompi_datatype_create_from_packed_description: Assertion `data_id < 45' failed. [ppc:05244] *** Process received signal *** [ppc:05244] Signal: Aborted (6) [ppc:05244] Signal code: (-6) [ppc:05244] [ 0] /lib/libpthread.so.0(+0xeff0) [0x7fe58a275ff0] [ppc:05244] [ 1] /lib/libc.so.6(gsignal+0x35) [0x7fe589f371b5] [ppc:05244] [ 2] /lib/libc.so.6(abort+0x180) [0x7fe589f39fc0] [ppc:05244] [ 3] /lib/libc.so.6(__assert_fail+0xf1) [0x7fe589f30301] [ppc:05244] [ 4] /ompi/lib/libmpi.so.0(+0x6504e) [0x7fe58a4e804e] [ppc:05244] [ 5] /ompi/lib/libmpi.so.0(ompi_datatype_create_from_packed_description+0x23) [0x7fe58a4e8cf6] [ppc:05244] [ 6] /ompi/lib/openmpi/mca_osc_rdma.so(+0xd04b) [0x7fe5839a104b] [ppc:05244] [ 7] /ompi/lib/openmpi/mca_osc_rdma.so(ompi_osc_rdma_sendreq_recv_put+0xa8) [0x7fe5839a3ae5] [ppc:05244] [ 8] /ompi/lib/openmpi/mca_osc_rdma.so(+0x86cc) [0x7fe58399c6cc] [ppc:05244] [ 9] /ompi/lib/openmpi/mca_btl_self.so(mca_btl_self_send+0x87) [0x7fe58510bb04] [ppc:05244] [10] /ompi/lib/openmpi/mca_osc_rdma.so(+0xc44b) [0x7fe5839a044b] [ppc:05244] [11] /ompi/lib/openmpi/mca_osc_rdma.so(+0xd69d) [0x7fe5839a169d] [ppc:05244] [12] /ompi/lib/openmpi/mca_osc_rdma.so(ompi_osc_rdma_flush+0x50) [0x7fe5839a1776] [ppc:05244] [13] /ompi/lib/openmpi/mca_osc_rdma.so(ompi_osc_rdma_module_fence+0x8e6) [0x7fe5839a84ab] [ppc:05244] [14] /ompi/lib/libmpi.so.0(MPI_Win_fence+0x16f) [0x7fe58a54127d] [ppc:05244] [15] ompi-trunk/put_dup_type() [0x400d10] [ppc:05244] [16] /lib/libc.so.6(__libc_start_main+0xfd) [0x7fe589f23c8d] [ppc:05244] [17] put_dup_type() [0x400b09] [ppc:05244] *** End of error message *** -------------------------------------------------------------------------- mpiexec noticed that process rank 0 with PID 5244 on node ppc exited on signal 6 (Aborted). -------------------------------------------------------------------------- ========================================================================== __ompi_datatype_create_from_packed_description function, in which the assertion failure occurred, seems to expect the value of data_id is an ID of a predefined datatype. In my environment, the value of data_id is 68, that is an ID of the datatype created by MPI_Type_contiguous. On one-sided communication, the target side datatype is encoded as 'description' at the origin side and then it is decoded at the target side. I think there are problems in both encoding stage and decoding stage. __ompi_datatype_pack_description function in ompi/datatype/ompi_datatype_args.c file encodes the datatype. For MPI_COMBINER_DUP on line 451, it encodes only create_type and id and returns immediately. It doesn't encode the information of the base dataype (in my case, the datatype created by MPI_Type_contiguous). __ompi_datatype_create_from_packed_description function in ompi/datatype/ompi_datatype_args.c file decodes the description. For MPI_COMBINER_DUP in line 557, it expects the value of data_id is an ID of a predefined datatype. It is not always true. I cannot fix this problem yet because I'm not familiar with the datatype code in Open MPI. MPI_COMBINER_DUP is also used for predefined datatypes and the calculation of total_pack_size is also involved. It seems not so simple. Regards, KAWASHIMA Takahiro
#include <stdint.h> #include <stdio.h> #include <mpi.h> #define PRINT_ARGS #ifdef PRINT_ARGS /* defined in ompi/datatype/ompi_datatype_args.c */ extern int32_t ompi_datatype_print_args(const struct ompi_datatype_t *pData); #endif int main(int argc, char *argv[]) { MPI_Win win; MPI_Datatype dt1, dt2; int obuf[1], tbuf[1]; obuf[0] = 77; tbuf[0] = 88; MPI_Init(&argc, &argv); MPI_Type_contiguous(1, MPI_INT, &dt1); MPI_Type_dup(dt1, &dt2); MPI_Type_commit(&dt2); #ifdef PRINT_ARGS printf("#### dt1 (%p) ####\n", (void *)dt1); ompi_datatype_print_args(dt1); printf("#### dt2 (%p) ####\n", (void *)dt2); ompi_datatype_print_args(dt2); fflush(stdout); #endif MPI_Win_create(tbuf, sizeof(int), 1, MPI_INFO_NULL, MPI_COMM_SELF, &win); MPI_Win_fence(0, win); MPI_Put(obuf, 1, MPI_INT, 0, 0, 1, dt2, win); MPI_Win_fence(0, win); MPI_Type_free(&dt1); MPI_Type_free(&dt2); MPI_Win_free(&win); MPI_Finalize(); if (tbuf[0] == 77) { printf("OK\n"); } else { printf("NG\n"); } fflush(stdout); return 0; }