I had a typo in my btl_tcp_if_exclude such that it was effectively mpirun --mca btl_tco_if_exclude bogus ...
instead of ignoring the actual interface I wanted to ignore. And since I wasn't ignoring the special loopback device that I have on some machines, every single MPI job hung because they tried to use those interfaces to communicate with processes on other nodes that that interface could not reach. On Feb 4, 2013, at 5:56 PM, "Barrett, Brian W" <bwba...@sandia.gov> wrote: > I'm confused; why is it disastrous to have an interface in if_exclude that > doesn't exist? I can see it being a problem if we don't exclude something in > the list, but the other way is (in my opinion) harmless but with a useful use > case... > > Brian > > > > Sent with Good (www.good.com) > > > -----Original Message----- > From: Jeff Squyres (jsquyres) [mailto:jsquy...@cisco.com] > Sent: Monday, February 04, 2013 06:47 PM Mountain Standard Time > To: Open MPI Developers > Subject: [EXTERNAL] Re: [OMPI devel] [OMPI svn] svn:open-mpi r28016 - > trunk/ompi/mca/btl/tcp > > On Feb 4, 2013, at 2:03 PM, George Bosilca <bosi...@icl.utk.edu> wrote: > >> The two behaviors you describe for include and exclude do not look >> conflicting to me. Inclusion is a strong request, the user enforce the usage >> of a specific interface. If the interface is not available, then we have a >> problem. Exclude on the other side, must enforce that a specific interface >> is not in use, fact that can be quite simple if the interface is not >> available. > > I still maintain that it's equally disastrous if you don't exclude the > correct interfaces (I lost 2 nights of MTT because of this!). > >> I'm not a fan of the nowarn option. Seems like a lot of code with limited >> interest, especially if we only plan to support it in TCP. > > This is a good point -- I wonder what openib (and others?) do who support > *_if_include and *_if_exclude notation. Do they warn / error if you specify > an invalid interface? > >> If you need specialized arguments for some of your nodes here is what I do: >> rename the binaries to .orig, and use the original name to create a sh >> script that will change the value of mca_param_files to something based on >> the host name (if such a file exists) and then call the .orig executable. >> Works like a charm., even when a batch scheduler is used. > > That will still be quite difficult to do in MTT. Remember: all the tests > that are run in MTT are shared across all of us via the ompi-tests SVN repo. > Are you suggesting that I alias every test in the ompi-tests SVN with a > public script that you should run that should look for some site-specific MCA > override param file? > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/