If it ain't broke, don't fix it. I am more than skeptical about the
interest of this new notation.

The two behaviors you describe for include and exclude do not look
conflicting to me. Inclusion is a strong request, the user enforce the
usage of a specific interface. If the interface is not available, then
we have a problem. Exclude on the other side, must enforce that a
specific interface is not in use, fact that can be quite simple if the
interface is not available.

I'm not a fan of the nowarn option. Seems like a lot of code with
limited interest, especially if we only plan to support it in TCP.

If you need specialized arguments for some of your nodes here is what
I do: rename the binaries to .orig, and use the original name to
create a sh script that will change the value of mca_param_files to
something based on the host name (if such a file exists) and then call
the .orig executable. Works like a charm., even when a batch scheduler
is used.

  George.

On Mon, Feb 4, 2013 at 12:02 PM, Jeff Squyres (jsquyres)
<jsquy...@cisco.com> wrote:
> On Feb 1, 2013, at 9:59 PM, "Barrett, Brian W" <bwba...@sandia.gov> wrote:
>
>> I don't think this is right either. Excluding a device that doesn't exist 
>> has many use cases. Such as disabling a network that only exists on part of 
>> the cluster.  I'm not sure about what to do with seq; it's more like include 
>> than exclude.
>
> Hmm.  I've now given this quite a bit of thought.  Here's what I think:
>
> 1. Just like there might be good reasons to exclude non-existent interfaces 
> (e.g., networks that only include on part of the cluster), the same argument 
> could be made for *including* non-existent interfaces.
>
> 2. It seems odd to me to have different behavior for non-existent interfaces 
> between include, exclude, and/or seq.
>
> 3. We have a very strong precedent throughout OMPI that if a human asks for 
> something that OMPI can't deliver, OMPI should error.  According to this, and 
> according to the Law of Least Surprise, I would think that if I typo an 
> exclude interface name, OMPI should error and make a human figure it out.
>
> 4. If someone wants different includes/excludes in different parts of the 
> cluster, then they should have per-node values for these MCA params.
>
> 5. That being said, #4 is not always feasible.  Concrete example (which is 
> why this whole thing started, incidentally): in my MTT cluster at Cisco, I 
> have *some* nodes with back-to-back interfaces.  I can't think of a good way 
> to have per-node MCA params in an MTT run that is SLURM-queued and may end up 
> on random nodes in my cluster -- that may or may not include nodes with 
> loopback interfaces.
>
> So how about this compromise:
>
> If an invalid include, exclude, or if_seq interface is specified:
> - If that interface is prefaced with "nowarn:", silently ignore that token
> - Otherwise, display a show_help message and ignore the TCP BTL
>
> For example:
>
>     mpirun --mca btl_tcp_if_include nowarn:eth5,eth6
>
> - If eth5 doesn't exist, the job will continue just as if eth5 wasn't 
> specified
> - If eth6 doesn't exist, the TCP BTL will disqualify itself
>
> (BTW: yes, I'm volunteering to code up whatever we agree on)
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Reply via email to