I think that is quite accurate and would be helpful in resolving the
problem...


On 7/10/07 10:32 AM, "Jeff Squyres" <jsquy...@cisco.com> wrote:

> Point taken.
> 
> Is this an accurate summary?
> 
> 1. "Best practices" should be documented, to include sysadmins
> specifically itemizing what components should be used on their
> systems (e.g., in an environment variable or the system-wide MCA
> parameters file).
> 
> 2. It may be useful to have some high-level parameters to specify a
> specific run-time environment, since ORTE has multiple, related
> frameworks (e.g., RAS and PLS).  E.g., "orte_base_launcher=tm", or
> somesuch.
> 
> 
> On Jul 10, 2007, at 9:08 AM, Ralph H Castain wrote:
> 
>> Actually, I was talking specifically about configuration at build
>> time. I
>> realize there are trade-offs here, and suspect we can find a common
>> ground.
>> The problem with using the options Jeff described is that they require
>> knowledge on the part of the builder as to what environments have
>> had their
>> include files/libraries installed on the file system of this
>> particular
>> machine. And unfortunately, not every component is protected by these
>> "sentinel" variables, nor does it appear possible to do so in a
>> "guaranteed
>> safe" manner.
>> 
>> Note that I didn't say "installed on their machine". In most cases,
>> these
>> alternative environments are not currently installed at all - they
>> are stale
>> files, or were placed on the file system by someone that wanted to
>> look at
>> their documentation, or whatever. The problem is that Open MPI
>> blindly picks
>> them up and attempts to use them, with sometimes disastrous and
>> frequently
>> unpredictable ways.
>> 
>> Hence, the user can be "astonished" to find that an application
>> that worked
>> perfectly yesterday suddenly segfaults today - because someone
>> decided one
>> day, for example, to un-tar the bproc files in a public place where
>> we pick
>> them up, and then someone else (perhaps a sys admin or the user
>> themselves)
>> at some later time rebuilt Open MPI to bring in an update.
>> 
>> Now imagine being a software provider who gets the call about a
>> problem with
>> Open MPI and has to figure out what the heck happened....
>> 
>> My suggested solution may not be the best, which is why I put it
>> out there
>> for discussion. One alternative might be for us to instruct sys
>> admins to
>> put MCA params in their default param file that force selection of the
>> proper components for each framework. Thus, someone with an lsf
>> system would
>> enter:  pls=lsf ras=lsf sds=lsf in their config file to ensure that
>> only lsf
>> was used.
>> 
>> The negative to that approach is that we would have to warn
>> everyone any
>> time that list changed (e.g., a new component for a new framework).
>> Another
>> option to help that problem, of course, would be to set one mca
>> param (say
>> something like "enviro=lsf") that we would use internal to Open MPI
>> to set
>> the individual components correctly - i.e., we would hold the list of
>> relevant frameworks internally since (hopefully) we know what they
>> should be
>> for a given environment.
>> 
>> Anyway, I'm glad people are looking at this and suggesting
>> solutions. It is
>> a problem that seems to be biting us recently and may become a
>> bigger issue
>> as the user community grows.
>> 
>> Ralph
>> 
>> 
>> On 7/10/07 6:12 AM, "Bogdan Costescu"
>> <bogdan.coste...@iwr.uni-heidelberg.de> wrote:
>> 
>>> On Tue, 10 Jul 2007, Jeff Squyres wrote:
>>> 
>>>> Do either of these work for you?
>>> 
>>> Will report back in a bit, I'm now in the middle of an OS upgrade on
>>> the cluster.
>>> 
>>> But my question was more like: is this a configuration that should
>>> theoretically work ? Or in other words, are there known dependencies
>>> on rsh that would make a rsh-less build not work or work with reduced
>>> functionality ?
>>> 
>>>> Most batch systems today set a sentinel environment variable that we
>>>> check for.
>>> 
>>> I think that we talk about slightly different things - my impression
>>> was that the OP was asking about detection at config time, while your
>>> statements make perfect sense to me if they are relative to detection
>>> at run-time. If the OP was indeed asking about run-time detection,
>>> then I apologize for the time you wasted on reading and replying
>>> to my
>>> questions...
>>> 
>>>> That's what the compile-time vs. run-time detection and selection is
>>>> supposed to be for.
>>> 
>>> Yes, I understand that, it's the same type of mechanism as in LAM/MPI
>>> which it's not that foreign to me ;-)
>> 
>> 
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 


Reply via email to