Hi,
I and my colleague found 3 OSC-related bugs in OMPI datatype code.
One for trunk and v1.6/v1.7 branches, and two for only v1.6 branch.
(1) OMPI_DATATYPE_ALIGN_PTR should be placed after memcpy
Last year I reported a bug in OMPI datatype code and it was
fixed in r25721. But the fix was not correct and the problem
still exists.
My reported bug and the patch:
http://www.open-mpi.org/community/lists/devel/2012/01/10207.php
r25721:
https://svn.open-mpi.org/trac/ompi/changeset/25721
OMPI_DATATYPE_ALIGN_PTR should be placed after memcpy
in __ompi_datatype_pack_description function, like the
patch attached in my previous mail.
I didn't confirm r25721 well when it was committed, sorry.
The attached file datatype-align.patch is the correct patch
for the latest trunk. This fix should be applied to trunk
and v1.7/v1.6 branches.
(2) r28790 should be merged into v1.6
The trunk changeset r28790 had been merged into v1.7 in r28790
(ticket #3673), but it is not yet merged into v1.6.
I confirmed the problem reported last month also occurs in v1.6
and can be fixed by merging r28790 into v1.6.
The original reported problem:
http://www.open-mpi.org/community/lists/devel/2013/07/12595.php
(3) OMPI_DATATYPE_MAX_PREDEFINED should be 46 for v1.6
In v1.6 branch, ompi/datatype/ompi_datatype.h defines
OMPI_DATATYPE_MAX_PREDEFINED as 45 but the number of
predefined datatypes is 46 and the last predefined
datatype ID (OMPI_DATATYPE_MPI_UB) is 45.
OMPI_DATATYPE_MAX_PREDEFINED is used as the number of
predefined datatypes or maximum predefined datatype ID + 1,
not the maximum predefined datatype ID, like below.
ompi/op/op.c:79:
// the number of predefined datatypes
int ompi_op_ddt_map[OMPI_DATATYPE_MAX_PREDEFINED];
ompi/datatype/ompi_datatype_args.c:573:
// maximum predefined datatype ID + 1
assert( data_id < OMPI_DATATYPE_MAX_PREDEFINED );
ompi/datatype/ompi_datatype_args.c:492:
// first unused datatype ID
// (= maximum predefined datatype ID + 1)
int next_index = OMPI_DATATYPE_MAX_PREDEFINED;
So its value should be 46 for v1.6.
Actually, at r28932 in trunk, one datatype (MPI_Count) is
added but OMPI_DATATYPE_MAX_PREDEFINED is increased
from 45 to 47. So current trunk is correct.
This bug causes a random error, like SEGV, "Error recreating
datatype", or "received packet for Window with unknown type",
if you use MPI_UB in OSC, like the attached program osc_ub.c.
Regards,
Takahiro Kawashima,
MPI development team,
Fujitsu
Index: ompi/datatype/ompi_datatype_args.c
===================================================================
--- ompi/datatype/ompi_datatype_args.c (revision 29064)
+++ ompi/datatype/ompi_datatype_args.c (working copy)
@@ -467,12 +467,13 @@
position = (int*)next_packed;
next_packed += sizeof(int) * args->cd;
- /* description of next datatype should be 64 bits aligned */
- OMPI_DATATYPE_ALIGN_PTR(next_packed, char*);
/* copy the aray of counts (32 bits aligned) */
memcpy( next_packed, args->i, sizeof(int) * args->ci );
next_packed += args->ci * sizeof(int);
+ /* description of next datatype should be 64 bits aligned */
+ OMPI_DATATYPE_ALIGN_PTR(next_packed, char*);
+
/* copy the rest of the data */
for( i = 0; i < args->cd; i++ ) {
ompi_datatype_t* temp_data = args->d[i];
#include <stdio.h>
#include <mpi.h>
int main(int argc, char *argv[])
{
int size, rank;
MPI_Win win;
MPI_Datatype datatype;
MPI_Datatype datatypes[] = {MPI_INT, MPI_UB};
int blengths[] = {1, 1};
MPI_Aint displs[] = {0, sizeof(int)};
int buf[] = {0};
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
if (size < 2) {
fprintf(stderr, "Needs at least 2 processes\n");
MPI_Abort(MPI_COMM_WORLD, 1);
}
MPI_Type_create_struct(2, blengths, displs, datatypes, &datatype);
MPI_Type_commit(&datatype);
MPI_Win_create(buf, sizeof(int), 1, MPI_INFO_NULL, MPI_COMM_WORLD, &win);
MPI_Win_fence(0, win);
if (rank == 0) {
MPI_Put(buf, 1, datatype, 1, 0, 1, datatype, win);
}
MPI_Win_fence(0, win);
MPI_Win_free(&win);
MPI_Type_free(&datatype);
MPI_Finalize();
return 0;
}