In looking at this, perhaps you can help me understand something. The grpcomm hier modex is the same regardless of what info is given to it. So how is it that this works fine with IB, but not for the TCP btl? Are you relying on something in the modex to track data identity, but the IB btl doesn't?
On Mar 16, 2011, at 2:25 PM, George Bosilca wrote: > Actually I think that Damien analysis is correct. On a 8 nodes cluster > > mpirun -npernode 1 -np 4 --mca grpcomm hier --mca btl self,sm,tcp ./IMB-MPI1 > Sendrecv > > does work, while > > mpirun -npernode 2 -np 4 --mca grpcomm hier --mca btl self,sm,tcp ./IMB-MPI1 > Sendrecv > > doesn't. As soon as I remove the grpcomm (aka use bad instead) everything > works as expected. > > I just committed a patch (r24534) to the TCP BTL to output more information > and here is what I get when I add --mca btl_base_verbose 100 to the mpirun. > > [node02:01565] btl: tcp: attempting to connect() to [[14725,1],0] address > 192.168.3.1 on port 1024 > [node02:01565] btl: tcp: attempting to connect() to [[14725,1],1] address > 192.168.3.1 on port 1024 > [node01:31562] btl: tcp: attempting to connect() to [[14725,1],2] address > 192.168.3.2 on port 1026 > [node01:31561] btl: tcp: attempting to connect() to [[14725,1],2] address > 192.168.3.2 on port 1026 > [node01:31562] btl: tcp: attempting to connect() to [[14725,1],3] address > 192.168.3.2 on port 1026 > > The "-npernode 2" will place 2 processes per node, so the vpid 0 and 1 will > be on node01 and vpid 2 and 3 will be on node02. Looking at the BTL TCP > connection attempts one can clearly see that process 01565 on node02 think > that both vpid 0 and 1 can be joined using address 192.168.3.1 on port 1024, > which is obviously wrong. > > As removing the grpcomm hier solves the problem, I would expect the issues is > not in the TCP BTL. > > george. > > > On Mar 16, 2011, at 15:16 , Ralph Castain wrote: > >> I suspect something else is wrong - the grpcomm system never has any >> visibility as to what data goes into the modex, or how that data is used. In >> other words, if the tcp btl isn't providing adequate info, then it would >> fail regardless of which grpcomm module was in use. So your statement about >> the hier module not distinguishing between peers on the same node doesn't >> make sense - the hier module has no idea that a tcp btl even exists, let >> alone have anything to do with the modex data. >> >> You might take a look at how the tcp btl is picking its sockets. The srun >> direct launch method may be setting envars that confuse it, perhaps causing >> the procs to all pick the same socket. >> >> >> On Mar 16, 2011, at 12:48 PM, Damien Guinier wrote: >> >>> Hi all >>> >>> From my test, it is impossible to use "btl:tcp" with "grpcomm:hier". The >>> "grpcomm:hier" module is important because, "srun" launch protocol can't >>> use any other "grpcomm" module. >>> You can reproduce this bug, by using "btl:tcp" and "grpcomm:hier" , when >>> you create a ring(like: IMB sendrecv) >>> >>> $>salloc -N 2 -n 4 mpirun --mca grpcomm hier --mca btl self,sm,tcp >>> ./IMB-MPI1 Sendrecv >>> salloc: Granted job allocation 2979 >>> [cuzco95][[59536,1],2][btl_tcp_endpoint.c:486:mca_btl_tcp_endpoint_recv_connect_ack] >>> received unexpected process identifier [[59536,1],0] >>> [cuzco92][[59536,1],0][btl_tcp_endpoint.c:486:mca_btl_tcp_endpoint_recv_connect_ack] >>> received unexpected process identifier [[59536,1],2] >>> ^C >>> $> >>> >>> This error message show: "btl:tcp" have create a connection to a peer, but >>> it not the good one ( peer identity is checked with the "ack"). >>> To create a connection between two peers with "btl:tcp": >>> - Each peer broadcast theirs IP parameters with ompi_modex_send(). >>> - IP parameters from selected peer is received with ompi_modex_recv(). >>> >>> In fact, modex use "orte_grpcomm.set_proc_attr()" and >>> "orte_grpcomm.get_proc_attr()" to exchange data. The problem is >>> "grpcomm:hier" doesn't make difference between two peer on the same node. >>> From my test the IP parameters, from the fist rank on the selected node, is >>> always return. >>> >>> >>> "grpcomm:hier" is restricted to "btl:sm" and "btl:openib" ? >>> >>> >>> -------- >>> >>> One easy solution to fix this problem, is to add rank information in the >>> "name" variable on >>> - ompi/runtime/ompi_module_exchange.c:ompi_modex_send() >>> - ompi/runtime/ompi_module_exchange.c:ompi_modex_recv() >>> but I dislike it. >>> >>> Someone have a better solution ? >>> >>> >>> thanks you >>> Damien >>> >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > "To preserve the freedom of the human mind then and freedom of the press, > every spirit should be ready to devote itself to martyrdom; for as long as we > may think as we will, and speak as we think, the condition of man will > proceed in improvement." > -- Thomas Jefferson, 1799 > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel