Nadia, I guess the revisions mentioned are from HG? If I’m not mistaken the change you mentioned corresponds to r29285. I’m not sure if they are related, as r29285 is about positioning a convertor, and this is only used in the case of multi-fragments messages. As this is not the case for your example, I don’t think they are related.
I guess we should look at all the patches in the opal/datatype and ompi/datatype over the last 13 months (the starting point of the 1.6.3). George. On Nov 25, 2013, at 14:10 , Nadia Derbey <nadia.der...@bull.net> wrote: > George, > > Thx for the detailed answer! > I did my tests on a v1.6.2 (changeset: 141b22222759). > After you told me it worked for you with earlier releases, I looked at the > changesets applied since that time. I guess 28fd94d282a3 is the one that > fixes my issue? > > Regards, > Nadia > > On 25/11/2013 13:36, George Bosilca wrote: >> Nadia, >> >> Which version of Open MPI are you using? I tried with the nightly r29751, >> the current 1.6 and the current 1.7 and I __always__ got the expected output. >> >> There is a simple way to show what the datatype engine is doing. You can set >> the MCA parameters >> mpi_ddt_unpack_debug and mpi_ddt_pack_debug to get more info. If you only >> want to see how the datatype looks after the MPI_Commit step you can call >> directly ompi_datatype_dump(ddt). This will show the internals of the >> datatype, converted in predefined types. >> >> As an example I took the application you provided and build the following >> picture of what is send and what is received (original buffer, send >> datatype, packed buffer, recv datatype, resulting buffer). >> >> <Mail Attachment.png> >> >> Now using the ompi_datatype_dump, I see the recv and the send datatypes as: >> >> -cC---P-DB-[---][---] OPAL_UINT1 count 8 disp 0x0 (0) extent 1 (size 8) >> -cC---P-DB-[---][---] OPAL_UINT1 count 8 disp 0x10 (16) extent 1 (size 8) >> -cC—P-DB-[—][—] OPAL_INT4 count 4 disp 0x30 (48) extent 4 (size 16) >> ———————-G—[—][—] OPAL_END_LOOP pref 3 elements first elem >> displacement 0 size of data 32 >> >> -cC---P-DB-[---][---] OPAL_UINT1 count 24 disp 0x10 (16) extent 1 (size >> 24) >> -cC---P-DB-[---][---] OPAL_UINT1 count 8 disp 0x30 (48) extent 1 (size 8) >> ———G---[---][---] OPAL_END_LOOP prev 2 elements first elem >> displacement 16 size of data 32 >> >> This match perfectly to the datatype drawn by hand. >> >> George. >> >> >> >> On Nov 25, 2013, at 11:40 , Nadia Derbey <nadia.der...@bull.net> wrote: >> >>> Hi, >>> >>> I'm currently working on a bug occuring at the client site with openmpi >>> when calling MPI_Sendreceive() on datatypes built by the application. >>> I think I've found where the bug comes from (it is located in >>> opal_generic_simple_pack_function() - file >>> opal/datatype/opal_datatype_pack.c). But this code is so complicated that >>> I'm more than unsure of my fix. What I can say is that it fixes things for >>> me, but I need some advices from the datatypes specialists. >>> >>> --------------- >>> >>> You will find in attachment the reproducer provided by the client, as well >>> as the resulting output. >>> datatypes.c : reproducer >>> to run the binary: salloc --exclusive -p B510 -N 1 -n 1 mpirun ./datatypes >>> trc_ko: traces got without the patch applied >>> trc_ok: traces got with the patch applied. >>> >>> --------------- >>> >>> The proposed patch is the following: (Note that the very first change in >>> this patch was enough in my case, but I thought all the "source_base" >>> settings should follow this model.) >>> >>> ------------------------- >>> opal_generic_simple_pack_function: add the datatype lb when progressing in >>> the input buffer >>> >>> diff -r cb23c2f07e1f opal/datatype/opal_datatype_pack.c >>> --- a/opal/datatype/opal_datatype_pack.c Sun Nov 24 17:06:51 2013 >>> +0000 >>> +++ b/opal/datatype/opal_datatype_pack.c Mon Nov 25 10:48:00 2013 >>> +0100 >>> @@ -301,7 +301,7 @@ opal_generic_simple_pack_function( opal_ >>> PACK_PREDEFINED_DATATYPE( pConvertor, pElem, count_desc, >>> source_base, destination, >>> iov_len_local ); >>> if( 0 == count_desc ) { /* completed */ >>> - source_base = pConvertor->pBaseBuf + pStack->disp; >>> + source_base = pConvertor->pBaseBuf + pStack->disp + >>> pData->lb; >>> pos_desc++; /* advance to the next data */ >>> UPDATE_INTERNAL_COUNTERS( description, pos_desc, >>> pElem, count_desc ); >>> continue; >>> @@ -333,7 +333,7 @@ opal_generic_simple_pack_function( opal_ >>> pStack->disp += >>> description[pStack->index].loop.extent; >>> } >>> } >>> - source_base = pConvertor->pBaseBuf + pStack->disp; >>> + source_base = pConvertor->pBaseBuf + pStack->disp + >>> pData->lb; >>> UPDATE_INTERNAL_COUNTERS( description, pos_desc, pElem, >>> count_desc ); >>> DO_DEBUG( opal_output( 0, "pack new_loop count %d >>> stack_pos %d pos_desc %d disp %ld space %lu\n", >>> (int)pStack->count, >>> pConvertor->stack_pos, pos_desc, (long)pStack->disp, (unsigned >>> long)iov_len_local ); ); >>> @@ -354,7 +354,7 @@ opal_generic_simple_pack_function( opal_ >>> pStack->disp + local_disp); >>> pos_desc++; >>> update_loop_description: /* update the current state */ >>> - source_base = pConvertor->pBaseBuf + pStack->disp; >>> + source_base = pConvertor->pBaseBuf + pStack->disp + >>> pData->lb; >>> UPDATE_INTERNAL_COUNTERS( description, pos_desc, pElem, >>> count_desc ); >>> DDT_DUMP_STACK( pConvertor->pStack, pConvertor->stack_pos, >>> pElem, "advance loop" ); >>> continue; >>> @@ -374,7 +374,7 @@ opal_generic_simple_pack_function( opal_ >>> } >>> /* I complete an element, next step I should go to the next one */ >>> PUSH_STACK( pStack, pConvertor->stack_pos, pos_desc, >>> OPAL_DATATYPE_INT8, count_desc, >>> - source_base - pStack->disp - pConvertor->pBaseBuf ); >>> + source_base - pStack->disp - pConvertor->pBaseBuf - >>> pData->lb ); >>> DO_DEBUG( opal_output( 0, "pack save stack stack_pos %d pos_desc %d >>> count_desc %d disp %ld\n", >>> pConvertor->stack_pos, pStack->index, >>> (int)pStack->count, (long)pStack->disp ); ); >>> return 0; >>> >>> ------------------------------- >>> >>> Regards, >>> Nadia >>> -- >>> Nadia Derbey >>> Bull, Architect of an Open World >>> http://www.bull.com >>> <datatypes.c><trc_ko.txt><trc_ok.txt>_______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > -- > Nadia Derbey > Bull, Architect of an Open World > http://www.bull.com > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel