Hi Open MPI developers, We, Fujitsu, noticed that one-sided communication with some sort of derived datatype fails on sparc64 machines.
In one-sided communication of Open MPI, the structure of datatype of target buffer is: (1) encoded in origin process, and (2) transfered to target process, and (3) decoded in target process. This encoding and decoding are processed in ompi_datatype_args.c and it has consideration of alignment issue. But it seems insufficient. On encoding stage, __ompi_datatype_pack_description function has consideration of alignment issue, as described in its comment. For derived datatypes of one level, that code is OK. But for derived datatypes of multiple level (i.e. derived datatypes created from derived datatypes), __ompi_datatype_pack_description function is called recursively with unaligned packed_buffer if args->ci is odd. On the other hand, on decoding stage, __ompi_datatype_create_from_packed_description function expects a padding for odd args->ci. For derived datatypes, packed_buffer is always aligned to 64 bits even if this function is called recursively. This incompatibility causes a segmentation fault or something in ompi_ddt_create_xxxx function called by __ompi_ddt_create_from_args function. It seems decoding stage and buffer size calculation (ALLOC_ARGS macro) have an enough consideration of alignment issue. So I think fixing encoding stage is sufficient for this bug. I've attached patches for trunk and v1.4 branch respectively. A program (needs sparc64) to reproduce this probrem is also attached. This bug appears if all following conditions are met. - sparc64 or some alignment sensitive architectures (configure generates OPAL_ALIGN_WORD_SIZE_INTEGERS == 1) - use derived datatype for target buffer of one-sided communication - create that derived datatype by multiple level MPI_Type_create_xxxx - use one of following function in second level or later (args->ci is odd) * MPI_Type_create_hvector * MPI_Type_create_struct * MPI_Type_create_hindexed * MPI_Type_create_indexed_block Regards, Takahiro Kawashima, MPI development team, Fujitsu
osc-derived.trunk.patch
Description: Binary data
osc-derived.v1.4.patch
Description: Binary data
osc-hvector.c
Description: Binary data