Hi,

I'm reading the datatype code in Open MPI trunk and have a question.
A bit long.

See the following program.

----------------------------------------------------------------
#include <stdio.h>
#include <mpi.h>

struct opal_datatype_t;
extern int opal_init(int *pargc, char ***pargv);
extern int opal_finalize(void);
extern void opal_datatype_dump(struct opal_datatype_t *type);
extern struct opal_datatype_t opal_datatype_int8;

int main(int argc, char **argv)
{
    opal_init(NULL, NULL);
    opal_datatype_dump(&opal_datatype_int8);
    MPI_Init(NULL, NULL);
    opal_datatype_dump(&opal_datatype_int8);
    MPI_Finalize();
    opal_finalize();
    return 0;
}
----------------------------------------------------------------

All variables/functions declared as 'extern' are defined in OPAL.
opal_datatype_dump() function outputs internal data of a datatype.
I expect the same output on two opal_datatype_dump() calls.
But when I run it on an x86_64 machine, I get the following output.

----------------------------------------------------------------
ompi-trunk/opal-datatype-dump && ompiexec -n 1 ompi-trunk/opal-datatype-dump
[ppc.rivis.jp:27886] Datatype 0x600c60[OPAL_INT8] size 8 align 8 id 7 length 1 
used 1
true_lb 0 true_ub 8 (true_extent 8) lb 0 ub 8 (extent 8)
nbElems 1 loops 0 flags 136 (commited contiguous )-cC---P-DB-[---][---]
   contain OPAL_INT8
--C---P-D--[---][---]      OPAL_INT8 count 1 disp 0x0 (0) extent 8 (size 8)
No optimized description

[ppc.rivis.jp:27886] Datatype 0x600c60[OPAL_INT8] size 8 align 8 id 7 length 1 
used 1
true_lb 0 true_ub 8 (true_extent 8) lb 0 ub 8 (extent 8)
nbElems 1 loops 0 flags 136 (commited contiguous )-cC---P-DB-[---][---]
   contain OPAL_INT8
--C---P-D--[---][---]               count 1 disp 0x0 (0) extent 8 (size 8971008)
No optimized description
----------------------------------------------------------------

The former output is what I expected. But the latter one is not
identical to the former one and its content datatype has no name
and a very large size.

This line is output in opal_datatype_dump_data_desc() function in
opal/datatype/opal_datatype_dump.c file. It refers
opal_datatype_basicDatatypes[pDesc->elem.common.type]->name and
opal_datatype_basicDatatypes[pDesc->elem.common.type]->size for
the content datatype.

In this case, pDesc->elem.common.type is
opal_datatype_int8.desc.desc[0].elem.common.type and is initialized to 7
in opal_datatype_init() function in opal/datatype/opal_datatype_module.c
file, which is called during opal_init() function.
opal_datatype_int8.desc.desc points &opal_datatype_predefined_elem_desc[7*2].

But if we call MPI_Init() function, the value is overwritten.
ompi_datatype_init() function in ompi/datatype/ompi_datatype_module.c
file, which is called during MPI_Init() function, has similar
procedure to initialize OMPI datatypes.

On initializing ompi_mpi_aint in it, ompi_mpi_aint.dt.super.desc.desc
points &opal_datatype_predefined_elem_desc[7*2], which is also pointed
by opal_datatype_int8, because ompi_mpi_aint is defined by
OMPI_DATATYPE_INIT_PREDEFINED_BASIC_TYPE macro and it uses
OPAL_DATATYPE_INITIALIZER_INT8 macro. So
opal_datatype_int8.desc.desc[0].elem.common.type is overwritten
to 37.

Therefore in the second opal_datatype_dump() function call in my
program, opal_datatype_basicDatatypes[37] is accessed.
But the array length of opal_datatype_basicDatatypes is 25.

Summarize:

  static initializer:
    opal_datatype_predefined_elem_desc[25] = {{0, ...}, ...};
    opal_datatype_int8.desc.desc = &opal_datatype_predefined_elem_desc[7*2];
    ompi_mpi_aint.dt.super.desc.desc = &opal_datatype_predefined_elem_desc[7*2];

  opal_init:
    opal_datatype_int8.desc.desc.elem.common.type = 7;

  MPI_Init:
    ompi_mpi_aint.dt.super.desc.desc.elem.common.type = 37;

  opal_datatype_dump:
    access to opal_datatype_predefined_elem_desc[37]

While opal_datatype_dump() function might not be called from
user's programs, breaking opal_datatype_predefined_elem_desc
array in ompi_datatype_init() function is not good.

Though the above is described for opal_datatype_int8 and ompi_mpi_aint,
the same thing happens to other datatypes.

Though I tried to fix this problem, I could not figure out the
correct solution.

  - The first loop in ompi_datatype_init() function should be removed?
    But OMPI Fortran datatypes should be initialized in it?

  - All OMPI datatypes should point ompi_datatype_predefined_elem_desc
    array? But having same 'type' value in OPAL datatypes and OMPI
    datatypes is allowed?

Regards,
KAWASHIMA Takahiro

Reply via email to