Hi,
I'm reading the datatype code in Open MPI trunk and have a question.
A bit long.
See the following program.
----------------------------------------------------------------
#include <stdio.h>
#include <mpi.h>
struct opal_datatype_t;
extern int opal_init(int *pargc, char ***pargv);
extern int opal_finalize(void);
extern void opal_datatype_dump(struct opal_datatype_t *type);
extern struct opal_datatype_t opal_datatype_int8;
int main(int argc, char **argv)
{
opal_init(NULL, NULL);
opal_datatype_dump(&opal_datatype_int8);
MPI_Init(NULL, NULL);
opal_datatype_dump(&opal_datatype_int8);
MPI_Finalize();
opal_finalize();
return 0;
}
----------------------------------------------------------------
All variables/functions declared as 'extern' are defined in OPAL.
opal_datatype_dump() function outputs internal data of a datatype.
I expect the same output on two opal_datatype_dump() calls.
But when I run it on an x86_64 machine, I get the following output.
----------------------------------------------------------------
ompi-trunk/opal-datatype-dump && ompiexec -n 1 ompi-trunk/opal-datatype-dump
[ppc.rivis.jp:27886] Datatype 0x600c60[OPAL_INT8] size 8 align 8 id 7 length 1
used 1
true_lb 0 true_ub 8 (true_extent 8) lb 0 ub 8 (extent 8)
nbElems 1 loops 0 flags 136 (commited contiguous )-cC---P-DB-[---][---]
contain OPAL_INT8
--C---P-D--[---][---] OPAL_INT8 count 1 disp 0x0 (0) extent 8 (size 8)
No optimized description
[ppc.rivis.jp:27886] Datatype 0x600c60[OPAL_INT8] size 8 align 8 id 7 length 1
used 1
true_lb 0 true_ub 8 (true_extent 8) lb 0 ub 8 (extent 8)
nbElems 1 loops 0 flags 136 (commited contiguous )-cC---P-DB-[---][---]
contain OPAL_INT8
--C---P-D--[---][---] count 1 disp 0x0 (0) extent 8 (size 8971008)
No optimized description
----------------------------------------------------------------
The former output is what I expected. But the latter one is not
identical to the former one and its content datatype has no name
and a very large size.
This line is output in opal_datatype_dump_data_desc() function in
opal/datatype/opal_datatype_dump.c file. It refers
opal_datatype_basicDatatypes[pDesc->elem.common.type]->name and
opal_datatype_basicDatatypes[pDesc->elem.common.type]->size for
the content datatype.
In this case, pDesc->elem.common.type is
opal_datatype_int8.desc.desc[0].elem.common.type and is initialized to 7
in opal_datatype_init() function in opal/datatype/opal_datatype_module.c
file, which is called during opal_init() function.
opal_datatype_int8.desc.desc points &opal_datatype_predefined_elem_desc[7*2].
But if we call MPI_Init() function, the value is overwritten.
ompi_datatype_init() function in ompi/datatype/ompi_datatype_module.c
file, which is called during MPI_Init() function, has similar
procedure to initialize OMPI datatypes.
On initializing ompi_mpi_aint in it, ompi_mpi_aint.dt.super.desc.desc
points &opal_datatype_predefined_elem_desc[7*2], which is also pointed
by opal_datatype_int8, because ompi_mpi_aint is defined by
OMPI_DATATYPE_INIT_PREDEFINED_BASIC_TYPE macro and it uses
OPAL_DATATYPE_INITIALIZER_INT8 macro. So
opal_datatype_int8.desc.desc[0].elem.common.type is overwritten
to 37.
Therefore in the second opal_datatype_dump() function call in my
program, opal_datatype_basicDatatypes[37] is accessed.
But the array length of opal_datatype_basicDatatypes is 25.
Summarize:
static initializer:
opal_datatype_predefined_elem_desc[25] = {{0, ...}, ...};
opal_datatype_int8.desc.desc = &opal_datatype_predefined_elem_desc[7*2];
ompi_mpi_aint.dt.super.desc.desc = &opal_datatype_predefined_elem_desc[7*2];
opal_init:
opal_datatype_int8.desc.desc.elem.common.type = 7;
MPI_Init:
ompi_mpi_aint.dt.super.desc.desc.elem.common.type = 37;
opal_datatype_dump:
access to opal_datatype_predefined_elem_desc[37]
While opal_datatype_dump() function might not be called from
user's programs, breaking opal_datatype_predefined_elem_desc
array in ompi_datatype_init() function is not good.
Though the above is described for opal_datatype_int8 and ompi_mpi_aint,
the same thing happens to other datatypes.
Though I tried to fix this problem, I could not figure out the
correct solution.
- The first loop in ompi_datatype_init() function should be removed?
But OMPI Fortran datatypes should be initialized in it?
- All OMPI datatypes should point ompi_datatype_predefined_elem_desc
array? But having same 'type' value in OPAL datatypes and OMPI
datatypes is allowed?
Regards,
KAWASHIMA Takahiro