Aha! You are the first to fall thru the timeout. How interesting.

Can you please try adding “-mca oob_tcp_connect_timeout 5:0”?

On Dec 12, 2014, at 8:53 AM, Paul Hargrove <phhargr...@lbl.gov> wrote:
> 
> 
> First, I want to ask what became of the issue discussed in this thread?
>    http://www.open-mpi.org/community/lists/devel/2014/11/16160.php 
> <http://www.open-mpi.org/community/lists/devel/2014/11/16160.php>
> I though we had concluded that one just needed -D_REENTRANT.
> I mention that only for completeness, because I think my current problem is 
> different.
> 
> The following works fine with 1.8.3, making the current behavior a regression.
> 
> I am still on the same system as that previous report, and still/again see a 
> message like the following:
> 
> ------------------------------------------------------------
> A process or daemon was unable to complete a TCP connection
> to another process:
>   Local host:    pcp-j-19
>   Remote host:   172.18.0.120
> This is usually caused by a firewall on the remote host. Please
> check that any firewall (e.g., iptables) has been disabled and
> try again.
> ------------------------------------------------------------
> --------------------------------------------------------------------------
> ORTE was unable to reliably start one or more daemons.
> This usually is caused by:
> [...etc...]
> 
> It may be worth noting that the hostname pcp-j-19 (172.16.0.119) and the 
> address 172.18.0.120 are on different subnets.
> 
> I CANNOT resolve the issue this time by adding -D_REENTRANT to CFLAGS at 
> configure time (I didn't bother to check if it there by default now or not).
> 
> NOR can I resolve it by using "-mca oob_tcp_if_include bge0" to allow only 
> the 172.16.0.120 subnet.
> IN FACT, the message is the same with that option, other than "172.18" 
> changing to "172.16".
> 
> I've attached the output generated by "-mca oob_base_verbose 20" both with 
> and without the oob_tcp_if_include.
> 
> I should also note that that the following is my full mpirun command, which 
> excludes the tcp BTL.
> pcp-j-20$ mpirun -mca oob_tcp_if_include bge0 -mca oob_base_verbose 20 -mca 
> btl sm,self,openib -np 2 -host pcp-j-19,pcp-j-20 examples/ring_c
> 
> 
> -Paul
> 
> -- 
> Paul H. Hargrove                          phhargr...@lbl.gov 
> <mailto:phhargr...@lbl.gov>
> Computer Languages & Systems Software (CLaSS) Group
> Computer Science Department               Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
> <stdout-inc.txt><stderr-2if.txt>_______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/12/16551.php

Reply via email to