There are many differences between the trunk and 1.8 regarding the TCP BTL.
The major I remember about is that the TCP in the trunk is reporting errors
to the upper level via the callbacks attached to fragments, while the 1.8
TCP BTL doesn't.

So, I guess that once a connection to a particular endpoint fails, the
trunk is getting the errors reported via the cb and then takes some drastic
measure. In the 1.8 we might fallback and try another IP address before
giving up.

  George.



On Wed, Aug 13, 2014 at 10:55 AM, Jeff Squyres (jsquyres) <
jsquy...@cisco.com> wrote:

> I think this is expected behavior.
>
> If you have networks that you need Open MPI to ignore (e.g., a private
> network that *looks* reachable between multiple servers -- because the
> interfaces are on the same subnet -- but actually *isn't*), then the
> include/exclude mechanism is the right way to exclude them.
>
> That being said, I'm not sure why the behavior is different between trunk
> and v1.8.
>
>
> On Aug 13, 2014, at 1:41 AM, Gilles Gouaillardet <
> gilles.gouaillar...@iferc.org> wrote:
>
> > Folks,
> >
> > i noticed mpirun (trunk) hangs when running any mpi program on two nodes
> > *and* each node has a private network with the same ip
> > (in my case, each node has a private network to a MIC)
> >
> > in order to reproduce the problem, you can simply run (as root) on the
> > two compute nodes
> > brctl addbr br0
> > ifconfig br0 192.168.255.1 netmask 255.255.255.0
> >
> > mpirun will hang
> >
> > a workaroung is to add --mca btl_tcp_if_include eth0
> >
> > v1.8 does not hang in this case
> >
> > Cheers,
> >
> > Gilles
> > _______________________________________________
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/08/15623.php
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/08/15631.php
>

Reply via email to