Takahiro,

Nice catch. That particular code was an over-optimizations … that failed. 
Please try with the patch below.

Let me know if it's working as expected, I will push it in the trunk once 
confirmed.

  George.


Index: ompi/datatype/ompi_datatype_args.c
===================================================================
--- ompi/datatype/ompi_datatype_args.c  (revision 28787)
+++ ompi/datatype/ompi_datatype_args.c  (working copy)
@@ -449,9 +449,10 @@
     }
     /* For duplicated datatype we don't have to store all the information */
     if( MPI_COMBINER_DUP == args->create_type ) {
-        position[0] = args->create_type;
-        position[1] = args->d[0]->id; /* On the OMPI - layer, copy the 
ompi_datatype.id */
-        return OMPI_SUCCESS;
+        ompi_datatype_t* temp_data = args->d[0];
+        return __ompi_datatype_pack_description(temp_data,
+                                                packed_buffer,
+                                                next_index );
     }
     position[0] = args->create_type;
     position[1] = args->ci;



On Jul 14, 2013, at 14:30 , KAWASHIMA Takahiro <rivis.kawash...@nifty.com> 
wrote:

> Hi,
> 
> I encountered an assertion failure in Open MPI trunk and found a bug.
> 
> See the attached program. This program can be run with mpiexec -n 1.
> This program calls MPI_Put and writes one int value to the target side.
> The target side datatype is equivalent to MPI_INT, but is a derived
> datatype created by MPI_Type_contiguous and MPI_Type_Dup.
> 
> This program aborts with the following output.
> 
> ==========================================================================
> #### dt1 (0x2626160) ####
> type 2 count ints 1 count disp 0 count datatype 1
> ints:     1 
> types:    MPI_INT 
> #### dt2 (0x2626340) ####
> type 1 count ints 0 count disp 0 count datatype 1
> types:    0x2626160 
> put_dup_type: ../../../ompi/datatype/ompi_datatype_args.c:565: 
> __ompi_datatype_create_from_packed_description: Assertion `data_id < 45' 
> failed.
> [ppc:05244] *** Process received signal ***
> [ppc:05244] Signal: Aborted (6)
> [ppc:05244] Signal code:  (-6)
> [ppc:05244] [ 0] /lib/libpthread.so.0(+0xeff0) [0x7fe58a275ff0]
> [ppc:05244] [ 1] /lib/libc.so.6(gsignal+0x35) [0x7fe589f371b5]
> [ppc:05244] [ 2] /lib/libc.so.6(abort+0x180) [0x7fe589f39fc0]
> [ppc:05244] [ 3] /lib/libc.so.6(__assert_fail+0xf1) [0x7fe589f30301]
> [ppc:05244] [ 4] /ompi/lib/libmpi.so.0(+0x6504e) [0x7fe58a4e804e]
> [ppc:05244] [ 5] 
> /ompi/lib/libmpi.so.0(ompi_datatype_create_from_packed_description+0x23) 
> [0x7fe58a4e8cf6]
> [ppc:05244] [ 6] /ompi/lib/openmpi/mca_osc_rdma.so(+0xd04b) [0x7fe5839a104b]
> [ppc:05244] [ 7] 
> /ompi/lib/openmpi/mca_osc_rdma.so(ompi_osc_rdma_sendreq_recv_put+0xa8) 
> [0x7fe5839a3ae5]
> [ppc:05244] [ 8] /ompi/lib/openmpi/mca_osc_rdma.so(+0x86cc) [0x7fe58399c6cc]
> [ppc:05244] [ 9] /ompi/lib/openmpi/mca_btl_self.so(mca_btl_self_send+0x87) 
> [0x7fe58510bb04]
> [ppc:05244] [10] /ompi/lib/openmpi/mca_osc_rdma.so(+0xc44b) [0x7fe5839a044b]
> [ppc:05244] [11] /ompi/lib/openmpi/mca_osc_rdma.so(+0xd69d) [0x7fe5839a169d]
> [ppc:05244] [12] /ompi/lib/openmpi/mca_osc_rdma.so(ompi_osc_rdma_flush+0x50) 
> [0x7fe5839a1776]
> [ppc:05244] [13] 
> /ompi/lib/openmpi/mca_osc_rdma.so(ompi_osc_rdma_module_fence+0x8e6) 
> [0x7fe5839a84ab]
> [ppc:05244] [14] /ompi/lib/libmpi.so.0(MPI_Win_fence+0x16f) [0x7fe58a54127d]
> [ppc:05244] [15] ompi-trunk/put_dup_type() [0x400d10]
> [ppc:05244] [16] /lib/libc.so.6(__libc_start_main+0xfd) [0x7fe589f23c8d]
> [ppc:05244] [17] put_dup_type() [0x400b09]
> [ppc:05244] *** End of error message ***
> --------------------------------------------------------------------------
> mpiexec noticed that process rank 0 with PID 5244 on node ppc exited on 
> signal 6 (Aborted).
> --------------------------------------------------------------------------
> ==========================================================================
> 
> __ompi_datatype_create_from_packed_description function, in which the
> assertion failure occurred, seems to expect the value of data_id is an
> ID of a predefined datatype. In my environment, the value of data_id
> is 68, that is an ID of the datatype created by MPI_Type_contiguous.
> 
> On one-sided communication, the target side datatype is encoded as
> 'description' at the origin side and then it is decoded at the target
> side. I think there are problems in both encoding stage and decoding
> stage.
> 
> __ompi_datatype_pack_description function in
> ompi/datatype/ompi_datatype_args.c file encodes the datatype.
> For MPI_COMBINER_DUP on line 451, it encodes only create_type and id
> and returns immediately. It doesn't encode the information of the base
> dataype (in my case, the datatype created by MPI_Type_contiguous).
> 
> __ompi_datatype_create_from_packed_description function in
> ompi/datatype/ompi_datatype_args.c file decodes the description.
> For MPI_COMBINER_DUP in line 557, it expects the value of data_id is
> an ID of a predefined datatype. It is not always true.
> 
> I cannot fix this problem yet because I'm not familiar with the datatype
> code in Open MPI. MPI_COMBINER_DUP is also used for predefined datatypes
> and the calculation of total_pack_size is also involved. It seems not
> so simple.
> 
> Regards,
> KAWASHIMA Takahiro
> <put_dup_type.c>_______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


Reply via email to