Rolf,

I didn’t see these on my check run. Can you run the MPI_Isend_ator test with 
mpi_ddt_pack_debug and mpi_ddt_unpack_debug set to 1. I would be interested in 
the output you get on your machine.

George.


On Apr 16, 2014, at 14:34 , Rolf vandeVaart <rvandeva...@nvidia.com> wrote:

> I have seen errors when running the intel test suite using the openib BTL 
> when transferring derived datatypes.  I do not see the error with sm or tcp 
> BTLs.  The errors begin after this checkin.
> 
> https://svn.open-mpi.org/trac/ompi/changeset/31370
> Timestamp: 04/11/14 16:06:56 (5 days ago)
> Author: bosilca
> Message: Reshape all the packing/unpacking functions to use the same 
> skeleton. Rewrite the
> generic_unpacking to take advantage of the same capabilitites.
> 
> Does anyone else see errors?  Here is an example running with r31370:
> 
> [rvandevaart@drossetti-ivy1 src]$ mpirun --mca btl self,openib -np 2 -host 
> drossetti-ivy0,drossetti-ivy1 --mca btl_openib_warn_default_gid_prefix 0 
> MPI_Isend_ator_c
> MPITEST error (1): libmpitest.c:1608 i=117, int32_t value=-1, expected 117
> MPITEST error (1): libmpitest.c:1578 i=195, char value=-1, expected -61
> MPITEST error (1): 2 errors in buffer (17,0,12) len 273 commsize 2 commtype 
> -10 data_type 13 root 1
> MPITEST error (1): libmpitest.c:1608 i=117, int32_t value=-1, expected 117
> MPITEST error (1): libmpitest.c:1578 i=195, char value=-1, expected -61
> MPITEST error (1): 2 errors in buffer (17,2,12) len 273 commsize 2 commtype 
> -16 data_type 13 root 1
> MPITEST info  (0): Starting MPI_Isend_ator: All Isend TO Root test
> MPITEST info  (0): Node spec MPITEST_comm_sizes[6]=2 too large, using 1
> MPITEST info  (0): Node spec MPITEST_comm_sizes[22]=2 too large, using 1
> MPITEST info  (0): Node spec MPITEST_comm_sizes[32]=2 too large, using 1
> MPITEST error (0): libmpitest.c:1608 i=117, int32_t value=-1, expected 118
> MPITEST error (0): libmpitest.c:1578 i=195, char value=-1, expected -60
> MPITEST error (0): 2 errors in buffer (17,0,12) len 273 commsize 2 commtype 
> -10 data_type 13 root 0
> MPITEST error (0): libmpitest.c:1608 i=117, int32_t value=-1, expected 118
> MPITEST error (0): libmpitest.c:1578 i=195, char value=-1, expected -60
> MPITEST error (0): 2 errors in buffer (17,2,12) len 273 commsize 2 commtype 
> -16 data_type 13 root 0
> MPITEST error (1): libmpitest.c:1608 i=117, int32_t value=-1, expected 117
> MPITEST error (1): libmpitest.c:1578 i=195, char value=-1, expected -61
> MPITEST error (1): 2 errors in buffer (17,4,12) len 273 commsize 2 commtype 
> -13 data_type 13 root 1
> MPITEST error (0): libmpitest.c:1608 i=117, int32_t value=-1, expected 118
> MPITEST error (0): libmpitest.c:1578 i=195, char value=-1, expected -60
> MPITEST error (0): 2 errors in buffer (17,4,12) len 273 commsize 2 commtype 
> -13 data_type 13 root 0
> MPITEST error (1): libmpitest.c:1608 i=117, int32_t value=-1, expected 117
> MPITEST error (1): libmpitest.c:1578 i=195, char value=-1, expected -61
> MPITEST error (1): 2 errors in buffer (17,6,12) len 273 commsize 2 commtype 
> -15 data_type 13 root 0
> MPITEST error (0): libmpitest.c:1608 i=117, int32_t value=-1, expected 117
> MPITEST error (0): libmpitest.c:1578 i=195, char value=-1, expected -61
> MPITEST error (0): 2 errors in buffer (17,6,12) len 273 commsize 2 commtype 
> -15 data_type 13 root 0
> MPITEST_results: MPI_Isend_ator: All Isend TO Root 8 tests FAILED (of 3744)
> -------------------------------------------------------
> Primary job  terminated normally, but 1 process returned
> a non-zero exit code.. Per user-direction, the job has been aborted.
> -------------------------------------------------------
> --------------------------------------------------------------------------
> mpirun detected that one or more processes exited with non-zero status, thus 
> causing
> the job to be terminated. The first process to do so was:
> 
>  Process name: [[12363,1],0]
>  Exit code:    4
> --------------------------------------------------------------------------
> [rvandevaart@drossetti-ivy1 src]$ 
> 
> 
> Here is an error with the trunk which is slightly different.
> [rvandevaart@drossetti-ivy1 src]$ mpirun --mca btl self,openib -np 2 -host 
> drossetti-ivy0,drossetti-ivy1 --mca btl_openib_warn_default_gid_prefix 0 
> MPI_Isend_ator_c
> [drossetti-ivy1.nvidia.com:22875] 
> ../../../opal/datatype/opal_datatype_position.c:72
>       Pointer 0x1ad414c size 4 is outside [0x1ac1d20,0x1ad1d08] for
>       base ptr 0x1ac1d20 count 273 and data 
> [drossetti-ivy1.nvidia.com:22875] Datatype 0x1ac0220[] size 104 align 16 id 0 
> length 22 used 21
> true_lb 0 true_ub 232 (true_extent 232) lb 0 ub 240 (extent 240)
> nbElems 21 loops 0 flags 1C4 (commited )-c--lu-GD--[---][---]
>   contain lb ub OPAL_LB OPAL_UB OPAL_INT1 OPAL_INT2 OPAL_INT4 OPAL_INT8 
> OPAL_UINT1 OPAL_UINT2 OPAL_UINT4 OPAL_UINT8 OPAL_FLOAT4 OPAL_FLOAT8 
> OPAL_FLOAT16 
> --C---P-D--[---][---]      OPAL_INT4 count 1 disp 0x0 (0) extent 4 (size 4)
> --C---P-D--[---][---]      OPAL_INT2 count 1 disp 0x8 (8) extent 2 (size 2)
> --C---P-D--[---][---]      OPAL_INT8 count 1 disp 0x10 (16) extent 8 (size 8)
> --C---P-D--[---][---]     OPAL_UINT2 count 1 disp 0x20 (32) extent 2 (size 2)
> --C---P-D--[---][---]     OPAL_UINT4 count 1 disp 0x24 (36) extent 4 (size 4)
> --C---P-D--[---][---]     OPAL_UINT8 count 1 disp 0x30 (48) extent 8 (size 8)
> --C---P-D--[---][---]    OPAL_FLOAT4 count 1 disp 0x40 (64) extent 4 (size 4)
> --C---P-D--[---][---]      OPAL_INT1 count 1 disp 0x48 (72) extent 1 (size 1)
> --C---P-D--[---][---]    OPAL_FLOAT8 count 1 disp 0x50 (80) extent 8 (size 8)
> --C---P-D--[---][---]     OPAL_UINT1 count 1 disp 0x60 (96) extent 1 (size 1)
> --C---P-D--[---][---]   OPAL_FLOAT16 count 1 disp 0x70 (112) extent 16 (size 
> 16)
> --C---P-D--[---][---]      OPAL_INT1 count 1 disp 0x90 (144) extent 1 (size 1)
> --C---P-D--[---][---]     OPAL_UINT1 count 1 disp 0x92 (146) extent 1 (size 1)
> --C---P-D--[---][---]      OPAL_INT2 count 1 disp 0x94 (148) extent 2 (size 2)
> --C---P-D--[---][---]     OPAL_UINT2 count 1 disp 0x98 (152) extent 2 (size 2)
> --C---P-D--[---][---]      OPAL_INT4 count 1 disp 0x9c (156) extent 4 (size 4)
> --C---P-D--[---][---]     OPAL_UINT4 count 1 disp 0xa4 (164) extent 4 (size 4)
> --C---P-D--[---][---]      OPAL_INT8 count 1 disp 0xb0 (176) extent 8 (size 8)
> --C---P-D--[---][---]     OPAL_UINT8 count 1 disp 0xc0 (192) extent 8 (size 8)
> --C---P-D--[---][---]      OPAL_INT8 count 1 disp 0xd0 (208) extent 8 (size 8)
> --C---P-D--[---][---]     OPAL_UINT8 count 1 disp 0xe0 (224) extent 8 (size 8)
> -------G---[---][---]  OPAL_END_LOOP prev 21 elements first elem displacement 
> 0 size of data 104
> Optimized description 
> -cC---P-DB-[---][---]      OPAL_INT4 count 1 disp 0x0 (0) extent 4 (size 4)
> -cC---P-DB-[---][---]      OPAL_INT2 count 1 disp 0x8 (8) extent 2 (size 2)
> -cC---P-DB-[---][---]      OPAL_INT8 count 1 disp 0x10 (16) extent 8 (size 8)
> -cC---P-DB-[---][---]     OPAL_UINT2 count 1 disp 0x20 (32) extent 2 (size 2)
> -cC---P-DB-[---][---]     OPAL_UINT4 count 1 disp 0x24 (36) extent 4 (size 4)
> -cC---P-DB-[---][---]     OPAL_UINT8 count 1 disp 0x30 (48) extent 8 (size 8)
> -cC---P-DB-[---][---]    OPAL_FLOAT4 count 1 disp 0x40 (64) extent 4 (size 4)
> -cC---P-DB-[---][---]      OPAL_INT1 count 1 disp 0x48 (72) extent 1 (size 1)
> -cC---P-DB-[---][---]    OPAL_FLOAT8 count 1 disp 0x50 (80) extent 8 (size 8)
> -cC---P-DB-[---][---]     OPAL_UINT1 count 1 disp 0x60 (96) extent 1 (size 1)
> -cC---P-DB-[---][---]   OPAL_FLOAT16 count 1 disp 0x70 (112) extent 16 (size 
> 16)
> -cC---P-DB-[---][---]      OPAL_INT1 count 1 disp 0x90 (144) extent 1 (size 1)
> -cC---P-DB-[---][---]     OPAL_UINT1 count 1 disp 0x92 (146) extent 1 (size 1)
> -cC---P-DB-[---][---]      OPAL_INT2 count 1 disp 0x94 (148) extent 2 (size 2)
> -cC---P-DB-[---][---]     OPAL_UINT2 count 1 disp 0x98 (152) extent 2 (size 2)
> -cC---P-DB-[---][---]      OPAL_INT4 count 1 disp 0x9c (156) extent 4 (size 4)
> -cC---P-DB-[---][---]     OPAL_UINT4 count 1 disp 0xa4 (164) extent 4 (size 4)
> -cC---P-DB-[---][---]      OPAL_INT8 count 1 disp 0xb0 (176) extent 8 (size 8)
> -cC---P-DB-[---][---]     OPAL_UINT8 count 1 disp 0xc0 (192) extent 8 (size 8)
> -cC---P-DB-[---][---]      OPAL_INT8 count 1 disp 0xd0 (208) extent 8 (size 8)
> -cC---P-DB-[---][---]     OPAL_UINT8 count 1 disp 0xe0 (224) extent 8 (size 8)
> -------G---[---][---]  OPAL_END_LOOP prev 21 elements first elem displacement 
> 0 size of data 104
> 
> MPITEST error (1): libmpitest.c:1578 i=0, char value=-61, expected 0
> MPITEST error (1): libmpitest.c:1608 i=0, int32_t value=117, expected 0
> MPITEST error (1): libmpitest.c:1608 i=117, int32_t value=-1, expected 117
> MPITEST error (1): libmpitest.c:1578 i=195, char value=-1, expected -61
> MPITEST error (1): 4 errors in buffer (17,0,12) len 273 commsize 2 commtype 
> -10 data_type 13 root 1
> MPITEST info  (0): Starting MPI_Isend_ator: All Isend TO Root test
> MPITEST info  (0): Node spec MPITEST_comm_sizes[6]=2 too large, using 1
> MPITEST info  (0): Node spec MPITEST_comm_sizes[22]=2 too large, using 1
> MPITEST info  (0): Node spec MPITEST_comm_sizes[32]=2 too large, using 1
> MPITEST_results: MPI_Isend_ator: All Isend TO Root 1 tests FAILED (of 3744)
> -------------------------------------------------------
> Primary job  terminated normally, but 1 process returned
> a non-zero exit code.. Per user-direction, the job has been aborted.
> -------------------------------------------------------
> --------------------------------------------------------------------------
> mpirun detected that one or more processes exited with non-zero status, thus 
> causing
> the job to be terminated. The first process to do so was:
> 
>  Process name: [[12296,1],1]
>  Exit code:    1
> --------------------------------------------------------------------------
> [rvandevaart@drossetti-ivy1 src]$ 
> 
> -----------------------------------------------------------------------------------
> This email message is for the sole use of the intended recipient(s) and may 
> contain
> confidential information.  Any unauthorized review, use, disclosure or 
> distribution
> is prohibited.  If you are not the intended recipient, please contact the 
> sender by
> reply email and destroy all copies of the original message.
> -----------------------------------------------------------------------------------
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/04/14553.php

Reply via email to