I think that is quite accurate and would be helpful in resolving the problem...
On 7/10/07 10:32 AM, "Jeff Squyres" <jsquy...@cisco.com> wrote: > Point taken. > > Is this an accurate summary? > > 1. "Best practices" should be documented, to include sysadmins > specifically itemizing what components should be used on their > systems (e.g., in an environment variable or the system-wide MCA > parameters file). > > 2. It may be useful to have some high-level parameters to specify a > specific run-time environment, since ORTE has multiple, related > frameworks (e.g., RAS and PLS). E.g., "orte_base_launcher=tm", or > somesuch. > > > On Jul 10, 2007, at 9:08 AM, Ralph H Castain wrote: > >> Actually, I was talking specifically about configuration at build >> time. I >> realize there are trade-offs here, and suspect we can find a common >> ground. >> The problem with using the options Jeff described is that they require >> knowledge on the part of the builder as to what environments have >> had their >> include files/libraries installed on the file system of this >> particular >> machine. And unfortunately, not every component is protected by these >> "sentinel" variables, nor does it appear possible to do so in a >> "guaranteed >> safe" manner. >> >> Note that I didn't say "installed on their machine". In most cases, >> these >> alternative environments are not currently installed at all - they >> are stale >> files, or were placed on the file system by someone that wanted to >> look at >> their documentation, or whatever. The problem is that Open MPI >> blindly picks >> them up and attempts to use them, with sometimes disastrous and >> frequently >> unpredictable ways. >> >> Hence, the user can be "astonished" to find that an application >> that worked >> perfectly yesterday suddenly segfaults today - because someone >> decided one >> day, for example, to un-tar the bproc files in a public place where >> we pick >> them up, and then someone else (perhaps a sys admin or the user >> themselves) >> at some later time rebuilt Open MPI to bring in an update. >> >> Now imagine being a software provider who gets the call about a >> problem with >> Open MPI and has to figure out what the heck happened.... >> >> My suggested solution may not be the best, which is why I put it >> out there >> for discussion. One alternative might be for us to instruct sys >> admins to >> put MCA params in their default param file that force selection of the >> proper components for each framework. Thus, someone with an lsf >> system would >> enter: pls=lsf ras=lsf sds=lsf in their config file to ensure that >> only lsf >> was used. >> >> The negative to that approach is that we would have to warn >> everyone any >> time that list changed (e.g., a new component for a new framework). >> Another >> option to help that problem, of course, would be to set one mca >> param (say >> something like "enviro=lsf") that we would use internal to Open MPI >> to set >> the individual components correctly - i.e., we would hold the list of >> relevant frameworks internally since (hopefully) we know what they >> should be >> for a given environment. >> >> Anyway, I'm glad people are looking at this and suggesting >> solutions. It is >> a problem that seems to be biting us recently and may become a >> bigger issue >> as the user community grows. >> >> Ralph >> >> >> On 7/10/07 6:12 AM, "Bogdan Costescu" >> <bogdan.coste...@iwr.uni-heidelberg.de> wrote: >> >>> On Tue, 10 Jul 2007, Jeff Squyres wrote: >>> >>>> Do either of these work for you? >>> >>> Will report back in a bit, I'm now in the middle of an OS upgrade on >>> the cluster. >>> >>> But my question was more like: is this a configuration that should >>> theoretically work ? Or in other words, are there known dependencies >>> on rsh that would make a rsh-less build not work or work with reduced >>> functionality ? >>> >>>> Most batch systems today set a sentinel environment variable that we >>>> check for. >>> >>> I think that we talk about slightly different things - my impression >>> was that the OP was asking about detection at config time, while your >>> statements make perfect sense to me if they are relative to detection >>> at run-time. If the OP was indeed asking about run-time detection, >>> then I apologize for the time you wasted on reading and replying >>> to my >>> questions... >>> >>>> That's what the compile-time vs. run-time detection and selection is >>>> supposed to be for. >>> >>> Yes, I understand that, it's the same type of mechanism as in LAM/MPI >>> which it's not that foreign to me ;-) >> >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >