George - This looks right to me, but the patches are in the datatype engine, so can you weigh in?
Thanks, Brian On 1/11/12 10:04 PM, "Kawashima" <t-kawash...@jp.fujitsu.com> wrote: >Hi Open MPI developers, > >We, Fujitsu, noticed that one-sided communication with some sort of >derived datatype fails on sparc64 machines. > >In one-sided communication of Open MPI, the structure of datatype of >target buffer is: > (1) encoded in origin process, and > (2) transfered to target process, and > (3) decoded in target process. > >This encoding and decoding are processed in ompi_datatype_args.c and >it has consideration of alignment issue. But it seems insufficient. > >On encoding stage, __ompi_datatype_pack_description function >has consideration of alignment issue, as described in its comment. >For derived datatypes of one level, that code is OK. >But for derived datatypes of multiple level (i.e. derived datatypes >created from derived datatypes), __ompi_datatype_pack_description >function is called recursively with unaligned packed_buffer if >args->ci is odd. > >On the other hand, on decoding stage, >__ompi_datatype_create_from_packed_description function expects >a padding for odd args->ci. For derived datatypes, packed_buffer is >always aligned to 64 bits even if this function is called recursively. > >This incompatibility causes a segmentation fault or something >in ompi_ddt_create_xxxx function called by __ompi_ddt_create_from_args >function. > >It seems decoding stage and buffer size calculation (ALLOC_ARGS macro) >have an enough consideration of alignment issue. So I think fixing >encoding >stage is sufficient for this bug. > >I've attached patches for trunk and v1.4 branch respectively. >A program (needs sparc64) to reproduce this probrem is also attached. > >This bug appears if all following conditions are met. > > - sparc64 or some alignment sensitive architectures > (configure generates OPAL_ALIGN_WORD_SIZE_INTEGERS == 1) > - use derived datatype for target buffer of one-sided communication > - create that derived datatype by multiple level MPI_Type_create_xxxx > - use one of following function in second level or later > (args->ci is odd) > * MPI_Type_create_hvector > * MPI_Type_create_struct > * MPI_Type_create_hindexed > * MPI_Type_create_indexed_block > > >Regards, > >Takahiro Kawashima, >MPI development team, >Fujitsu >_______________________________________________ >devel mailing list >de...@open-mpi.org >http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Brian W. Barrett Dept. 1423: Scalable System Software Sandia National Laboratories