Very strange - I'll bet it is something in the hier modex algo that is losing 
the info about where the data came from. I'll take a look.


On Mar 16, 2011, at 2:25 PM, George Bosilca wrote:

> Actually I think that Damien analysis is correct. On a 8 nodes cluster
> 
> mpirun -npernode 1 -np 4 --mca grpcomm hier --mca btl self,sm,tcp ./IMB-MPI1 
> Sendrecv
> 
> does work, while 
> 
> mpirun -npernode 2 -np 4 --mca grpcomm hier --mca btl self,sm,tcp ./IMB-MPI1 
> Sendrecv
> 
> doesn't. As soon as I remove the grpcomm (aka use bad instead) everything 
> works as expected.
> 
> I just committed a patch (r24534) to the TCP BTL to output more information 
> and here is what I get when I add --mca btl_base_verbose 100 to the mpirun.
> 
> [node02:01565] btl: tcp: attempting to connect() to [[14725,1],0] address 
> 192.168.3.1 on port 1024
> [node02:01565] btl: tcp: attempting to connect() to [[14725,1],1] address 
> 192.168.3.1 on port 1024
> [node01:31562] btl: tcp: attempting to connect() to [[14725,1],2] address 
> 192.168.3.2 on port 1026
> [node01:31561] btl: tcp: attempting to connect() to [[14725,1],2] address 
> 192.168.3.2 on port 1026
> [node01:31562] btl: tcp: attempting to connect() to [[14725,1],3] address 
> 192.168.3.2 on port 1026
> 
> The "-npernode 2" will place 2 processes per node, so the vpid 0 and 1 will 
> be on node01 and vpid 2 and 3 will be on node02. Looking at the BTL TCP 
> connection attempts one can clearly see that process 01565 on node02 think 
> that both vpid 0 and 1 can be joined using address 192.168.3.1 on port 1024, 
> which is obviously wrong.
> 
> As removing the grpcomm hier solves the problem, I would expect the issues is 
> not in the TCP BTL.
> 
>  george.
> 
> 
> On Mar 16, 2011, at 15:16 , Ralph Castain wrote:
> 
>> I suspect something else is wrong - the grpcomm system never has any 
>> visibility as to what data goes into the modex, or how that data is used. In 
>> other words, if the tcp btl isn't providing adequate info, then it would 
>> fail regardless of which grpcomm module was in use. So your statement about 
>> the hier module not distinguishing between peers on the same node doesn't 
>> make sense - the hier module has no idea that a tcp btl even exists, let 
>> alone have anything to do with the modex data.
>> 
>> You might take a look at how the tcp btl is picking its sockets. The srun 
>> direct launch method may be setting envars that confuse it, perhaps causing 
>> the procs to all pick the same socket.
>> 
>> 
>> On Mar 16, 2011, at 12:48 PM, Damien Guinier wrote:
>> 
>>> Hi all
>>> 
>>> From my test, it is impossible to use "btl:tcp" with "grpcomm:hier". The 
>>> "grpcomm:hier" module is important because, "srun" launch protocol can't 
>>> use any other "grpcomm" module.
>>> You can reproduce this bug, by using "btl:tcp" and "grpcomm:hier" , when 
>>> you create a ring(like: IMB sendrecv)
>>> 
>>> $>salloc -N 2 -n 4 mpirun --mca grpcomm hier --mca btl self,sm,tcp 
>>> ./IMB-MPI1 Sendrecv
>>> salloc: Granted job allocation 2979
>>> [cuzco95][[59536,1],2][btl_tcp_endpoint.c:486:mca_btl_tcp_endpoint_recv_connect_ack]
>>>  received unexpected process identifier [[59536,1],0]
>>> [cuzco92][[59536,1],0][btl_tcp_endpoint.c:486:mca_btl_tcp_endpoint_recv_connect_ack]
>>>  received unexpected process identifier [[59536,1],2]
>>> ^C
>>> $>
>>> 
>>> This error message show: "btl:tcp" have create a connection to a peer, but 
>>> it not the good one ( peer identity is checked with the "ack").
>>> To create a connection between two peers with "btl:tcp":
>>> - Each peer broadcast theirs IP parameters with ompi_modex_send().
>>> - IP parameters from selected peer is received with ompi_modex_recv().
>>> 
>>> In fact, modex use "orte_grpcomm.set_proc_attr()" and 
>>> "orte_grpcomm.get_proc_attr()" to exchange data. The problem is 
>>> "grpcomm:hier" doesn't make difference between two peer on the same node. 
>>> From my test the IP parameters, from the fist rank on the selected node, is 
>>> always return.
>>> 
>>> 
>>> "grpcomm:hier" is restricted to "btl:sm" and "btl:openib" ?
>>> 
>>> 
>>> --------
>>> 
>>> One easy solution to fix this problem, is to add rank information in the 
>>> "name" variable on
>>> -    ompi/runtime/ompi_module_exchange.c:ompi_modex_send()
>>> -    ompi/runtime/ompi_module_exchange.c:ompi_modex_recv()
>>> but I dislike it.
>>> 
>>> Someone have a better solution ?
>>> 
>>> 
>>> thanks you
>>> Damien
>>> 
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> 
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> "To preserve the freedom of the human mind then and freedom of the press, 
> every spirit should be ready to devote itself to martyrdom; for as long as we 
> may think as we will, and speak as we think, the condition of man will 
> proceed in improvement."
>  -- Thomas Jefferson, 1799
> 
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


Reply via email to