Hi, I'm reading the datatype code in Open MPI trunk and have a question. A bit long.
See the following program. ---------------------------------------------------------------- #include <stdio.h> #include <mpi.h> struct opal_datatype_t; extern int opal_init(int *pargc, char ***pargv); extern int opal_finalize(void); extern void opal_datatype_dump(struct opal_datatype_t *type); extern struct opal_datatype_t opal_datatype_int8; int main(int argc, char **argv) { opal_init(NULL, NULL); opal_datatype_dump(&opal_datatype_int8); MPI_Init(NULL, NULL); opal_datatype_dump(&opal_datatype_int8); MPI_Finalize(); opal_finalize(); return 0; } ---------------------------------------------------------------- All variables/functions declared as 'extern' are defined in OPAL. opal_datatype_dump() function outputs internal data of a datatype. I expect the same output on two opal_datatype_dump() calls. But when I run it on an x86_64 machine, I get the following output. ---------------------------------------------------------------- ompi-trunk/opal-datatype-dump && ompiexec -n 1 ompi-trunk/opal-datatype-dump [ppc.rivis.jp:27886] Datatype 0x600c60[OPAL_INT8] size 8 align 8 id 7 length 1 used 1 true_lb 0 true_ub 8 (true_extent 8) lb 0 ub 8 (extent 8) nbElems 1 loops 0 flags 136 (commited contiguous )-cC---P-DB-[---][---] contain OPAL_INT8 --C---P-D--[---][---] OPAL_INT8 count 1 disp 0x0 (0) extent 8 (size 8) No optimized description [ppc.rivis.jp:27886] Datatype 0x600c60[OPAL_INT8] size 8 align 8 id 7 length 1 used 1 true_lb 0 true_ub 8 (true_extent 8) lb 0 ub 8 (extent 8) nbElems 1 loops 0 flags 136 (commited contiguous )-cC---P-DB-[---][---] contain OPAL_INT8 --C---P-D--[---][---] count 1 disp 0x0 (0) extent 8 (size 8971008) No optimized description ---------------------------------------------------------------- The former output is what I expected. But the latter one is not identical to the former one and its content datatype has no name and a very large size. This line is output in opal_datatype_dump_data_desc() function in opal/datatype/opal_datatype_dump.c file. It refers opal_datatype_basicDatatypes[pDesc->elem.common.type]->name and opal_datatype_basicDatatypes[pDesc->elem.common.type]->size for the content datatype. In this case, pDesc->elem.common.type is opal_datatype_int8.desc.desc[0].elem.common.type and is initialized to 7 in opal_datatype_init() function in opal/datatype/opal_datatype_module.c file, which is called during opal_init() function. opal_datatype_int8.desc.desc points &opal_datatype_predefined_elem_desc[7*2]. But if we call MPI_Init() function, the value is overwritten. ompi_datatype_init() function in ompi/datatype/ompi_datatype_module.c file, which is called during MPI_Init() function, has similar procedure to initialize OMPI datatypes. On initializing ompi_mpi_aint in it, ompi_mpi_aint.dt.super.desc.desc points &opal_datatype_predefined_elem_desc[7*2], which is also pointed by opal_datatype_int8, because ompi_mpi_aint is defined by OMPI_DATATYPE_INIT_PREDEFINED_BASIC_TYPE macro and it uses OPAL_DATATYPE_INITIALIZER_INT8 macro. So opal_datatype_int8.desc.desc[0].elem.common.type is overwritten to 37. Therefore in the second opal_datatype_dump() function call in my program, opal_datatype_basicDatatypes[37] is accessed. But the array length of opal_datatype_basicDatatypes is 25. Summarize: static initializer: opal_datatype_predefined_elem_desc[25] = {{0, ...}, ...}; opal_datatype_int8.desc.desc = &opal_datatype_predefined_elem_desc[7*2]; ompi_mpi_aint.dt.super.desc.desc = &opal_datatype_predefined_elem_desc[7*2]; opal_init: opal_datatype_int8.desc.desc.elem.common.type = 7; MPI_Init: ompi_mpi_aint.dt.super.desc.desc.elem.common.type = 37; opal_datatype_dump: access to opal_datatype_predefined_elem_desc[37] While opal_datatype_dump() function might not be called from user's programs, breaking opal_datatype_predefined_elem_desc array in ompi_datatype_init() function is not good. Though the above is described for opal_datatype_int8 and ompi_mpi_aint, the same thing happens to other datatypes. Though I tried to fix this problem, I could not figure out the correct solution. - The first loop in ompi_datatype_init() function should be removed? But OMPI Fortran datatypes should be initialized in it? - All OMPI datatypes should point ompi_datatype_predefined_elem_desc array? But having same 'type' value in OPAL datatypes and OMPI datatypes is allowed? Regards, KAWASHIMA Takahiro