Nathan, or anybody with access to the target hardware, If you can provide a minimalistic output of the applications with and without the above-mentioned patch and with mpi_ddt_unpack_debug and mpi_ddt_pack_debug, and mpi_ddt_position_debug set to 1, I would try to help.
George. On Thu, May 8, 2014 at 2:50 AM, Hjelm, Nathan T <hje...@lanl.gov> wrote: > Since I have a system that has the scif libraries installed I will try to > reproduce and see if I can come up with a fix. It will probably be sometime > next week at the earliest. > > -Nathan > ________________________________________ > From: devel [devel-boun...@open-mpi.org] on behalf of Gilles Gouaillardet > [gilles.gouaillar...@iferc.org] > Sent: Wednesday, May 07, 2014 9:03 PM > To: de...@open-mpi.org > Subject: Re: [OMPI devel] regression with derived datatypes > > On 2014/05/08 2:15, Ralph Castain wrote: >> I wonder if that might also explain the issue reported by Gilles regarding >> the scif BTL? In his example, the problem only occurred if the message was >> split across scif and vader. If so, then it might be that splitting messages >> in general is broken. >> > i am afraid there is a misunderstanding : > the problem always occur with scif,vader,self (regardless the ompi v1.8 > version) > the problem occurs with scif,self only if r31496 is applied to ompi v1.8 > > > In my previous email > http://www.open-mpi.org/community/lists/devel/2014/05/14699.php > i reported the following interesting fact : > > with ompi v1.8 (latest r31678), the following command produces incorrect > results : > mpirun -host localhost -np 2 --mca btl scif,self ./test_scif > > but with ompi v1.8 r31309, the very same command produces correct results > > Elena pointed that r31496 is a suspect. so i took the latest v1.8 > (r31678) and reverted r31496 and ... > > > mpirun -host localhost -np 2 --mca btl scif,self ./test_scif > > works again ! > > note that the "default" > mpirun -host localhost -np 2 --mca btl scif,vader,self ./test_scif > still produces incorrect results > > in order to reproduce the issue, a MIC is *not* needed, > you only need to install the software stack, load the mic kernel module > and make sure you can read/write /dev/mic/* > > bottom line, there are two issues here : > 1) r31496 broke something : mpirun -np 2 -host localhost --mca btl > scif,self ./test_scif > 2) something else never worked : mpirun -np 2 -host localhost --mca btl > scif,vader,self ./test_scif > > Gilles > > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/05/14739.php > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2014/05/14742.php