Hi,

I'm running ompi top-o-tree from github and seeing an openib btl issue where the qp/srq configuration is incorrect for the given device id. This works fine in 1.8.4rc1, but I see the problem in top-of-tree. A simple 2 node IMB-MPI1 pingpong fails to get the ranks setup. I see this logged:

/opt/ompi-trunk/bin/mpirun --allow-run-as-root --np 2 --host stevo1,stevo2 --mca btl openib,sm,self /opt/ompi-trunk/bin/IMB-MPI1 pingpong

<snip>

--------------------------------------------------------------------------
The Open MPI receive queue configuration for the OpenFabrics devices
on two nodes are incompatible, meaning that MPI processes on two
specific nodes were unable to communicate with each other.  This
generally happens when you are using OpenFabrics devices from
different vendors on the same network.  You should be able to use the
mca_btl_openib_receive_queues MCA parameter to set a uniform receive
queue configuration for all the devices in the MPI job, and therefore
be able to run successfully.

  Local host:       stevo2
  Local adapter:    cxgb4_0 (vendor 0x1425, part ID 21520)
Local queues: P,128,256,192,128:S,2048,1024,1008,64:S,12288,1024,1008,64:S,65536,1024,1008,64

  Remote host:      stevo1
  Remote adapter:   (vendor 0x1425, part ID 21520)
  Remote queues:    P,65536,64
----------------------------------------------------------------------------

The stevo1 rank has the correct queue settings: P,65536,64. For some reason, stevo2 has the wrong settings, even though it has the correct device id info.

Any suggestions on debugging this? Like where to dig in the src to see if somehow the .ini parsing is broken...


Thanks,

Steve.

Reply via email to