On Jul 10, 2007, at 6:07 AM, Bogdan Costescu wrote:
For example, I can readily find machines that are running TM, but
also have LSF and SLURM libraries installed (although those
environments are not "active" - the libraries in some cases are old
and stale, usually present because either someone wanted to look at
them or represent an old installation).
Whatever the outcome of this discussion is, please keep in mind that
this represents an exception rather than the rule. So the common cases
of no batch environment or one batch environment installed should work
as effortless as possible. Furthermore, keep in mind that there are
lots of people who don't compile themselves Open MPI, but rely on
packages compiled by others (Linux distributions, most likely) - so
don't make life harder for those who produce these packages.
FWIW, this is exactly the reason that we have the "auto as much as
possible" behavior today; back in LAM/MPI, we had the problem that
[many] users would say "I built LAM, but it doesn't support ABC, even
though your manual says that it does! LAM's a piece of junk!" The
sad fact is that most people assume that "./configure && make
install" will do all the Right magic for their system; efforts at
education seemed to fail. So we took the path of least resistance
and assumed that if we can find it on your system, we should use it.
Specifically: it was more of a support issue than anything else.
1. ... we would only build support for those environments that the
builder specifies, and error out of the build process if multiple
conflicting environments are specified.
I think that Ralf's suggestion (auto unless forced) is better, as it
allows:
- a better chance of finding the environments for people who don't
have too much experience with building Open MPI or hate to RTFM
- control over what is built or not for people who know what they
are doing
This raises the issue of what to do with rsh, but I think we can
handle that one by simply building it wherever possible.
I've been meaning to ask this for some time: is it possible to get rid
of rsh support when building/running in an environment where rsh is
not used (like a TM-based one) ? I'm not trying to achieve security by
doing this (after all, a user can build a separate copy of Open MPI
with rsh support), but just to make sure that the programs that I
build are either using the "blessed" start-up mechanism or error out.
Do either of these work for you?
1. Use the --enable-mca-no-build option as I discussed in a mail a
few minutes ago.
2. Remove the "mca_pls_rsh.*" files in $prefix/lib/openmpi.
2. We could laboriously go through all the components and ensure
that they
check in their selection logic to see if that environment is active.
I might be missing something in the design of batch systems or
software in general, but how do you decide that an environment is
active or not ?
Most batch systems today set a sentinel environment variable that we
check for.
Can a library check if it's being used in a program ?
Or if that program actually runs ? And if a configuration file exists,
does it mean that the environment is actually active ?
We do not generally assume that the presence of a plugin means that
that plugin can run in the current environment. I thought that all
framework selection logic was adapted to this philosophy, but
apparently Ralph is indicating that some do not. :-)
How to deal
with the case where there are several versions of the same batch
system installed, all using the same configuration files and therefore
being ready to run ?
We assume that Open MPI was built compiling/linking against the Right
version. There's not much else we can do if you build against the
Wrong version.
And how about the case where there is a machine
reserved for compilations, where libraries are made available but
there is no batch system active ?
That's what the compile-time vs. run-time detection and selection is
supposed to be for. The presence of an OMPI component at run-time is
not supposed to mean that it can run; it's supposed to be queried and
the component can do whatever checks it wants to see if it is
supposed to run, and then report "Yes, I can run" / "No, I cannot
run" back to Open MPI.
--
Jeff Squyres
Cisco Systems