On 2014/05/08 2:15, Ralph Castain wrote:
> I wonder if that might also explain the issue reported by Gilles regarding 
> the scif BTL? In his example, the problem only occurred if the message was 
> split across scif and vader. If so, then it might be that splitting messages 
> in general is broken.
>
i am afraid there is a misunderstanding :
the problem always occur with scif,vader,self (regardless the ompi v1.8
version)
the problem occurs with scif,self only if r31496 is applied to ompi v1.8


In my previous email
http://www.open-mpi.org/community/lists/devel/2014/05/14699.php
i reported the following interesting fact :

with ompi v1.8 (latest r31678), the following command produces incorrect
results :
mpirun -host localhost -np 2 --mca btl scif,self ./test_scif

but with ompi v1.8 r31309, the very same command produces correct results

Elena pointed that r31496 is a suspect. so i took the latest v1.8
(r31678) and reverted r31496 and ...


mpirun -host localhost -np 2 --mca btl scif,self ./test_scif

works again !

note that the "default"
mpirun -host localhost -np 2 --mca btl scif,vader,self ./test_scif
still produces incorrect results

in order to reproduce the issue, a MIC is *not* needed,
you only need to install the software stack, load the mic kernel module
and make sure you can read/write /dev/mic/*

bottom line, there are two issues here :
1) r31496 broke something : mpirun -np 2 -host localhost --mca btl
scif,self ./test_scif
2) something else never worked : mpirun -np 2 -host localhost --mca btl
scif,vader,self ./test_scif

Gilles

Reply via email to