Re: [OMPI devel] Multi-environment builds

Jeff Squyres Tue, 10 Jul 2007 07:39:09 -0400

On Jul 10, 2007, at 6:07 AM, Bogdan Costescu wrote:

For example, I can readily find machines that are running TM, but
also have LSF and SLURM libraries installed (although those
environments are not "active" - the libraries in some cases are old
and stale, usually present because either someone wanted to look at
them or represent an old installation).


Whatever the outcome of this discussion is, please keep in mind that
this represents an exception rather than the rule. So the common cases
of no batch environment or one batch environment installed should work
as effortless as possible. Furthermore, keep in mind that there are
lots of people who don't compile themselves Open MPI, but rely on
packages compiled by others (Linux distributions, most likely) - so
don't make life harder for those who produce these packages.

FWIW, this is exactly the reason that we have the "auto as much aspossible" behavior today; back in LAM/MPI, we had the problem that[many] users would say "I built LAM, but it doesn't support ABC, eventhough your manual says that it does! LAM's a piece of junk!" Thesad fact is that most people assume that "./configure && makeinstall" will do all the Right magic for their system; efforts ateducation seemed to fail. So we took the path of least resistanceand assumed that if we can find it on your system, we should use it.Specifically: it was more of a support issue than anything else.

1. ... we would only build support for those environments that the
builder specifies, and error out of the build process if multiple
conflicting environments are specified.


I think that Ralf's suggestion (auto unless forced) is better, as it
allows:
- a better chance of finding the environments for people who don't
have too much experience with building Open MPI or hate to RTFM
- control over what is built or not for people who know what they
are doing

This raises the issue of what to do with rsh, but I think we can
handle that one by simply building it wherever possible.


I've been meaning to ask this for some time: is it possible to get rid
of rsh support when building/running in an environment where rsh is
not used (like a TM-based one) ? I'm not trying to achieve security by
doing this (after all, a user can build a separate copy of Open MPI
with rsh support), but just to make sure that the programs that I
build are either using the "blessed" start-up mechanism or error out.


Do either of these work for you?

1. Use the --enable-mca-no-build option as I discussed in a mail afew minutes ago.

2. Remove the "mca_pls_rsh.*" files in $prefix/lib/openmpi.

2. We could laboriously go through all the components and ensurethat they
check in their selection logic to see if that environment is active.


I might be missing something in the design of batch systems or
software in general, but how do you decide that an environment is
active or not ?

Most batch systems today set a sentinel environment variable that wecheck for.

Can a library check if it's being used in a program ?
Or if that program actually runs ? And if a configuration file exists,
does it mean that the environment is actually active ?

We do not generally assume that the presence of a plugin means thatthat plugin can run in the current environment. I thought that allframework selection logic was adapted to this philosophy, butapparently Ralph is indicating that some do not. :-)

How to deal
with the case where there are several versions of the same batch
system installed, all using the same configuration files and therefore
being ready to run ?

We assume that Open MPI was built compiling/linking against the Rightversion. There's not much else we can do if you build against theWrong version.

And how about the case where there is a machine
reserved for compilations, where libraries are made available but
there is no batch system active ?

That's what the compile-time vs. run-time detection and selection issupposed to be for. The presence of an OMPI component at run-time isnot supposed to mean that it can run; it's supposed to be queried andthe component can do whatever checks it wants to see if it issupposed to run, and then report "Yes, I can run" / "No, I cannotrun" back to Open MPI.


--
Jeff Squyres
Cisco Systems

Re: [OMPI devel] Multi-environment builds

Reply via email to