Very strange - I'll bet it is something in the hier modex algo that is losing the info about where the data came from. I'll take a look.
On Mar 16, 2011, at 2:25 PM, George Bosilca wrote: > Actually I think that Damien analysis is correct. On a 8 nodes cluster > > mpirun -npernode 1 -np 4 --mca grpcomm hier --mca btl self,sm,tcp ./IMB-MPI1 > Sendrecv > > does work, while > > mpirun -npernode 2 -np 4 --mca grpcomm hier --mca btl self,sm,tcp ./IMB-MPI1 > Sendrecv > > doesn't. As soon as I remove the grpcomm (aka use bad instead) everything > works as expected. > > I just committed a patch (r24534) to the TCP BTL to output more information > and here is what I get when I add --mca btl_base_verbose 100 to the mpirun. > > [node02:01565] btl: tcp: attempting to connect() to [[14725,1],0] address > 192.168.3.1 on port 1024 > [node02:01565] btl: tcp: attempting to connect() to [[14725,1],1] address > 192.168.3.1 on port 1024 > [node01:31562] btl: tcp: attempting to connect() to [[14725,1],2] address > 192.168.3.2 on port 1026 > [node01:31561] btl: tcp: attempting to connect() to [[14725,1],2] address > 192.168.3.2 on port 1026 > [node01:31562] btl: tcp: attempting to connect() to [[14725,1],3] address > 192.168.3.2 on port 1026 > > The "-npernode 2" will place 2 processes per node, so the vpid 0 and 1 will > be on node01 and vpid 2 and 3 will be on node02. Looking at the BTL TCP > connection attempts one can clearly see that process 01565 on node02 think > that both vpid 0 and 1 can be joined using address 192.168.3.1 on port 1024, > which is obviously wrong. > > As removing the grpcomm hier solves the problem, I would expect the issues is > not in the TCP BTL. > > george. > > > On Mar 16, 2011, at 15:16 , Ralph Castain wrote: > >> I suspect something else is wrong - the grpcomm system never has any >> visibility as to what data goes into the modex, or how that data is used. In >> other words, if the tcp btl isn't providing adequate info, then it would >> fail regardless of which grpcomm module was in use. So your statement about >> the hier module not distinguishing between peers on the same node doesn't >> make sense - the hier module has no idea that a tcp btl even exists, let >> alone have anything to do with the modex data. >> >> You might take a look at how the tcp btl is picking its sockets. The srun >> direct launch method may be setting envars that confuse it, perhaps causing >> the procs to all pick the same socket. >> >> >> On Mar 16, 2011, at 12:48 PM, Damien Guinier wrote: >> >>> Hi all >>> >>> From my test, it is impossible to use "btl:tcp" with "grpcomm:hier". The >>> "grpcomm:hier" module is important because, "srun" launch protocol can't >>> use any other "grpcomm" module. >>> You can reproduce this bug, by using "btl:tcp" and "grpcomm:hier" , when >>> you create a ring(like: IMB sendrecv) >>> >>> $>salloc -N 2 -n 4 mpirun --mca grpcomm hier --mca btl self,sm,tcp >>> ./IMB-MPI1 Sendrecv >>> salloc: Granted job allocation 2979 >>> [cuzco95][[59536,1],2][btl_tcp_endpoint.c:486:mca_btl_tcp_endpoint_recv_connect_ack] >>> received unexpected process identifier [[59536,1],0] >>> [cuzco92][[59536,1],0][btl_tcp_endpoint.c:486:mca_btl_tcp_endpoint_recv_connect_ack] >>> received unexpected process identifier [[59536,1],2] >>> ^C >>> $> >>> >>> This error message show: "btl:tcp" have create a connection to a peer, but >>> it not the good one ( peer identity is checked with the "ack"). >>> To create a connection between two peers with "btl:tcp": >>> - Each peer broadcast theirs IP parameters with ompi_modex_send(). >>> - IP parameters from selected peer is received with ompi_modex_recv(). >>> >>> In fact, modex use "orte_grpcomm.set_proc_attr()" and >>> "orte_grpcomm.get_proc_attr()" to exchange data. The problem is >>> "grpcomm:hier" doesn't make difference between two peer on the same node. >>> From my test the IP parameters, from the fist rank on the selected node, is >>> always return. >>> >>> >>> "grpcomm:hier" is restricted to "btl:sm" and "btl:openib" ? >>> >>> >>> -------- >>> >>> One easy solution to fix this problem, is to add rank information in the >>> "name" variable on >>> - ompi/runtime/ompi_module_exchange.c:ompi_modex_send() >>> - ompi/runtime/ompi_module_exchange.c:ompi_modex_recv() >>> but I dislike it. >>> >>> Someone have a better solution ? >>> >>> >>> thanks you >>> Damien >>> >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > "To preserve the freedom of the human mind then and freedom of the press, > every spirit should be ready to devote itself to martyrdom; for as long as we > may think as we will, and speak as we think, the condition of man will > proceed in improvement." > -- Thomas Jefferson, 1799 > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel