I have run into the issue as well. I will open a pull request for 1.8.4
as part of a patch fixing the coalescing issues.

-Nathan

On Tue, Nov 04, 2014 at 02:50:30PM -0600, Steve Wise wrote:
> On 11/4/2014 2:09 PM, Steve Wise wrote:
> >Hi,
> >
> >I'm running ompi top-o-tree from github and seeing an openib btl issue
> >where the qp/srq configuration is incorrect for the given device id.  This
> >works fine in 1.8.4rc1, but I see the problem in top-of-tree.  A simple 2
> >node IMB-MPI1 pingpong fails to get the ranks setup.  I see this logged:
> >
> >/opt/ompi-trunk/bin/mpirun --allow-run-as-root --np 2 --host stevo1,stevo2
> >--mca btl openib,sm,self /opt/ompi-trunk/bin/IMB-MPI1 pingpong
> >
> 
> Adding this works around the issue:
> 
> --mca btl_openib_receive_queues P,65536,64
> 
> I also confirmed that opal_btl_openib_ini_query() is getting the correct
> receive_queues string from the .ini file on both nodes for the cxgb4
> device...
> 
> 
> ><snip>
> >
> >--------------------------------------------------------------------------
> >
> >The Open MPI receive queue configuration for the OpenFabrics devices
> >on two nodes are incompatible, meaning that MPI processes on two
> >specific nodes were unable to communicate with each other.  This
> >generally happens when you are using OpenFabrics devices from
> >different vendors on the same network.  You should be able to use the
> >mca_btl_openib_receive_queues MCA parameter to set a uniform receive
> >queue configuration for all the devices in the MPI job, and therefore
> >be able to run successfully.
> >
> >  Local host:       stevo2
> >  Local adapter:    cxgb4_0 (vendor 0x1425, part ID 21520)
> >  Local queues: 
> > P,128,256,192,128:S,2048,1024,1008,64:S,12288,1024,1008,64:S,65536,1024,1008,64
> >
> >  Remote host:      stevo1
> >  Remote adapter:   (vendor 0x1425, part ID 21520)
> >  Remote queues:    P,65536,64
> >----------------------------------------------------------------------------
> >
> >
> >The stevo1 rank has the correct queue settings: P,65536,64.  For some
> >reason, stevo2 has the wrong settings, even though it has the correct
> >device id info.
> >
> >Any suggestions on debugging this?  Like where to dig in the src to see if
> >somehow the .ini parsing is broken...
> >
> >
> >Thanks,
> >
> >Steve.
> >_______________________________________________
> >devel mailing list
> >de...@open-mpi.org
> >Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >Link to this post:
> >http://www.open-mpi.org/community/lists/devel/2014/11/16179.php
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/11/16180.php

Attachment: pgpoTwphTNFBB.pgp
Description: PGP signature

Reply via email to