George -

This looks right to me, but the patches are in the datatype engine, so can
you weigh in?

Thanks,

Brian

On 1/11/12 10:04 PM, "Kawashima" <t-kawash...@jp.fujitsu.com> wrote:

>Hi Open MPI developers,
>
>We, Fujitsu, noticed that one-sided communication with some sort of
>derived datatype fails on sparc64 machines.
>
>In one-sided communication of Open MPI, the structure of datatype of
>target buffer is:
>  (1) encoded in origin process, and
>  (2) transfered to target process, and
>  (3) decoded in target process.
>
>This encoding and decoding are processed in ompi_datatype_args.c and
>it has consideration of alignment issue. But it seems insufficient.
>
>On encoding stage, __ompi_datatype_pack_description function
>has consideration of alignment issue, as described in its comment.
>For derived datatypes of one level, that code is OK.
>But for derived datatypes of multiple level (i.e. derived datatypes
>created from derived datatypes), __ompi_datatype_pack_description
>function is called recursively with unaligned packed_buffer if
>args->ci is odd.
>
>On the other hand, on decoding stage,
>__ompi_datatype_create_from_packed_description function expects
>a padding for odd args->ci. For derived datatypes, packed_buffer is
>always aligned to 64 bits even if this function is called recursively.
>
>This incompatibility causes a segmentation fault or something
>in ompi_ddt_create_xxxx function called by __ompi_ddt_create_from_args
>function.
>
>It seems decoding stage and buffer size calculation (ALLOC_ARGS macro)
>have an enough consideration of alignment issue. So I think fixing
>encoding
>stage is sufficient for this bug.
>
>I've attached patches for trunk and v1.4 branch respectively.
>A program (needs sparc64) to reproduce this probrem is also attached.
>
>This bug appears if all following conditions are met.
>
>  - sparc64 or some alignment sensitive architectures
>    (configure generates OPAL_ALIGN_WORD_SIZE_INTEGERS == 1)
>  - use derived datatype for target buffer of one-sided communication
>  - create that derived datatype by multiple level MPI_Type_create_xxxx
>  - use one of following function in second level or later
>    (args->ci is odd)
>      * MPI_Type_create_hvector
>      * MPI_Type_create_struct
>      * MPI_Type_create_hindexed
>      * MPI_Type_create_indexed_block
>
>
>Regards,
>
>Takahiro Kawashima,
>MPI development team,
>Fujitsu
>_______________________________________________
>devel mailing list
>de...@open-mpi.org
>http://www.open-mpi.org/mailman/listinfo.cgi/devel


-- 
  Brian W. Barrett
  Dept. 1423: Scalable System Software
  Sandia National Laboratories






Reply via email to