On Wed, Jun 13, 2007 at 01:54:28PM -0400, Jeff Squyres wrote: > On Jun 13, 2007, at 1:37 PM, Gleb Natapov wrote: > > >> I have 2 hosts: one with 3 active ports and one with 2 active ports. > >> If I run an MPI job between them, the openib BTL wireup got badly and > >> it aborts. So handling a heterogeneous number of ports is not > >> currently handled properly in the code. > > Are the all in the same subnet? If not I fixed some bug yesterday that > > may help. > > No, they are not all on the same subnet: > > host svbu-mpi002: > port 1: DDR, subnet A > ports 2 and 3: SDR, subnet B > > host svbu-mpi003: > port 1: DDR, subnet A > port 2: SDR, subnet B > > With today's trunk, I still see the problem: > > [10:52] svbu-mpi:~/mpi % mpirun --mca btl openib,self -np 2 --host > svbu-mpi002,svbu-mpi003 ring > Process 1 waiting to receive from 0: tag 201 > Process 0 sending 10 to 1, tag 201 > [svbu-mpi002][0,1,0][btl_openib_endpoint.c: > 794:mca_btl_openib_endpoint_recv] can't find suitable endpoint for > this peer > Now I see that my fix was in the right place, but still a little bit wrong. I committed a fix to my fix in r15073. Can you check it?
-- Gleb.