Point taken.
Is this an accurate summary?
1. "Best practices" should be documented, to include sysadmins
specifically itemizing what components should be used on their
systems (e.g., in an environment variable or the system-wide MCA
parameters file).
2. It may be useful to have some high-level parameters to specify a
specific run-time environment, since ORTE has multiple, related
frameworks (e.g., RAS and PLS). E.g., "orte_base_launcher=tm", or
somesuch.
On Jul 10, 2007, at 9:08 AM, Ralph H Castain wrote:
Actually, I was talking specifically about configuration at build
time. I
realize there are trade-offs here, and suspect we can find a common
ground.
The problem with using the options Jeff described is that they require
knowledge on the part of the builder as to what environments have
had their
include files/libraries installed on the file system of this
particular
machine. And unfortunately, not every component is protected by these
"sentinel" variables, nor does it appear possible to do so in a
"guaranteed
safe" manner.
Note that I didn't say "installed on their machine". In most cases,
these
alternative environments are not currently installed at all - they
are stale
files, or were placed on the file system by someone that wanted to
look at
their documentation, or whatever. The problem is that Open MPI
blindly picks
them up and attempts to use them, with sometimes disastrous and
frequently
unpredictable ways.
Hence, the user can be "astonished" to find that an application
that worked
perfectly yesterday suddenly segfaults today - because someone
decided one
day, for example, to un-tar the bproc files in a public place where
we pick
them up, and then someone else (perhaps a sys admin or the user
themselves)
at some later time rebuilt Open MPI to bring in an update.
Now imagine being a software provider who gets the call about a
problem with
Open MPI and has to figure out what the heck happened....
My suggested solution may not be the best, which is why I put it
out there
for discussion. One alternative might be for us to instruct sys
admins to
put MCA params in their default param file that force selection of the
proper components for each framework. Thus, someone with an lsf
system would
enter: pls=lsf ras=lsf sds=lsf in their config file to ensure that
only lsf
was used.
The negative to that approach is that we would have to warn
everyone any
time that list changed (e.g., a new component for a new framework).
Another
option to help that problem, of course, would be to set one mca
param (say
something like "enviro=lsf") that we would use internal to Open MPI
to set
the individual components correctly - i.e., we would hold the list of
relevant frameworks internally since (hopefully) we know what they
should be
for a given environment.
Anyway, I'm glad people are looking at this and suggesting
solutions. It is
a problem that seems to be biting us recently and may become a
bigger issue
as the user community grows.
Ralph
On 7/10/07 6:12 AM, "Bogdan Costescu"
<bogdan.coste...@iwr.uni-heidelberg.de> wrote:
On Tue, 10 Jul 2007, Jeff Squyres wrote:
Do either of these work for you?
Will report back in a bit, I'm now in the middle of an OS upgrade on
the cluster.
But my question was more like: is this a configuration that should
theoretically work ? Or in other words, are there known dependencies
on rsh that would make a rsh-less build not work or work with reduced
functionality ?
Most batch systems today set a sentinel environment variable that we
check for.
I think that we talk about slightly different things - my impression
was that the OP was asking about detection at config time, while your
statements make perfect sense to me if they are relative to detection
at run-time. If the OP was indeed asking about run-time detection,
then I apologize for the time you wasted on reading and replying
to my
questions...
That's what the compile-time vs. run-time detection and selection is
supposed to be for.
Yes, I understand that, it's the same type of mechanism as in LAM/MPI
which it's not that foreign to me ;-)
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
--
Jeff Squyres
Cisco Systems