Nathan, or anybody with access to the target hardware,

If you can provide a minimalistic output of the applications with and
without the above-mentioned patch and with mpi_ddt_unpack_debug and
mpi_ddt_pack_debug, and mpi_ddt_position_debug set to 1, I would try
to help.

  George.


On Thu, May 8, 2014 at 2:50 AM, Hjelm, Nathan T <hje...@lanl.gov> wrote:
> Since I have a system that has the scif libraries installed I will try to 
> reproduce and see if I can come up with a fix. It will probably be sometime 
> next week at the earliest.
>
> -Nathan
> ________________________________________
> From: devel [devel-boun...@open-mpi.org] on behalf of Gilles Gouaillardet 
> [gilles.gouaillar...@iferc.org]
> Sent: Wednesday, May 07, 2014 9:03 PM
> To: de...@open-mpi.org
> Subject: Re: [OMPI devel] regression with derived datatypes
>
> On 2014/05/08 2:15, Ralph Castain wrote:
>> I wonder if that might also explain the issue reported by Gilles regarding 
>> the scif BTL? In his example, the problem only occurred if the message was 
>> split across scif and vader. If so, then it might be that splitting messages 
>> in general is broken.
>>
> i am afraid there is a misunderstanding :
> the problem always occur with scif,vader,self (regardless the ompi v1.8
> version)
> the problem occurs with scif,self only if r31496 is applied to ompi v1.8
>
>
> In my previous email
> http://www.open-mpi.org/community/lists/devel/2014/05/14699.php
> i reported the following interesting fact :
>
> with ompi v1.8 (latest r31678), the following command produces incorrect
> results :
> mpirun -host localhost -np 2 --mca btl scif,self ./test_scif
>
> but with ompi v1.8 r31309, the very same command produces correct results
>
> Elena pointed that r31496 is a suspect. so i took the latest v1.8
> (r31678) and reverted r31496 and ...
>
>
> mpirun -host localhost -np 2 --mca btl scif,self ./test_scif
>
> works again !
>
> note that the "default"
> mpirun -host localhost -np 2 --mca btl scif,vader,self ./test_scif
> still produces incorrect results
>
> in order to reproduce the issue, a MIC is *not* needed,
> you only need to install the software stack, load the mic kernel module
> and make sure you can read/write /dev/mic/*
>
> bottom line, there are two issues here :
> 1) r31496 broke something : mpirun -np 2 -host localhost --mca btl
> scif,self ./test_scif
> 2) something else never worked : mpirun -np 2 -host localhost --mca btl
> scif,vader,self ./test_scif
>
> Gilles
>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/05/14739.php
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/05/14742.php

Reply via email to