If there's a sleep(1) in the run-time test, that would be an annoying source of delay in the startup of a job. This is not a deal-breaker, but it would be nice(r) if there was a "fast" run-time check that could be checked during the sysv selection logic (i.e., sysv could disqualify itself if the feature is not available at runtime). Keep in mind that the run-time check will be run in parallel across the whole job, so it's (more or less) a constant amount of time that is added to job startup.
One thing to be careful with a run-time check is that you might not want *all* processes on a box to try to alloc a sysv segment, fork a child, try to connect, ...etc. With large count boxen, you might run out of sysv shmem segments if all procs try the test and/or run into OS serialization issues (someone here at the Forum cited a 96 core box). So you might want to have local rank 0 (or the orted? ...but that wouldn't work for srun / direct launch, etc.) do a test and communicate the results to the rest of the local procs -- maybe in the modex? On May 4, 2010, at 9:14 AM, N.M. Maclaren wrote: > On May 4 2010, Terry Dontje wrote: > >Ralph Castain wrote: > >> > >>> Is a configure-time test good enough? For example, are all Linuxes > >>> the same in this regard. That is if you built OMPI on RH and it > >>> configured in the new SysV SM will those bits actually run on other > >>> Linux systems correctly? I think Jeff had hinted to this similarly > >>> when suggesting this may need to be a runtime test. > >> > >> I don't think we have ever enforced that requirement, nor am I sure > >> the current code would meet it. We have a number of components that > >> test for ability to build, but don't check again at run-time. > >> > >> Generally, the project has followed the philosophy of "build on the > >> system you intend to run on". > >> > >There is at least one binary distribution that does build on one linux > >and allows to be installed on several others. That is the reason I > >bring up the above. The community can make a stance that that one > >distribution does not matter for this case or needs to handle it on its > >own. In the grand scheme of things it might not matter but I wanted to > >at least stand up and be heard. > > There is a gradation involved. Building on one distribution and using > on another is one thing. But the same distribution can use differently > built kernels, and the same system can be reconfigured (including both > package updating and parameter changing). It is highly undesirable to > use volatile parameters in non-volatile context. > > A lot of applications need rebuilding when the administrator updates > packages or makes configuration changes; that's not good and should be > avoided if at all possible. Given the way that systems are currently > configured, and the design of the autoconfigure mechanism, it's probably > not wholly avoidable. But it's still a very nasty gotcha. > > > Regards, > Nick Maclaren. > > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel > -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/