If there's a sleep(1) in the run-time test, that would be an annoying source of 
delay in the startup of a job. This is not a deal-breaker, but it would be 
nice(r) if there was a "fast" run-time check that could be checked during the 
sysv selection logic (i.e., sysv could disqualify itself if the feature is not 
available at runtime).   Keep in mind that the run-time check will be run in 
parallel across the whole job, so it's (more or less) a constant amount of time 
that is added to job startup.

One thing to be careful with a run-time check is that you might not want *all* 
processes on a box to try to alloc a sysv segment, fork a child, try to 
connect, ...etc.  With large count boxen, you might run out of sysv shmem 
segments if all procs try the test and/or run into OS serialization issues 
(someone here at the Forum cited a 96 core box).  So you might want to have 
local rank 0 (or the orted? ...but that wouldn't work for srun / direct launch, 
etc.) do a test and communicate the results to the rest of the local procs -- 
maybe in the modex?



On May 4, 2010, at 9:14 AM, N.M. Maclaren wrote:

> On May 4 2010, Terry Dontje wrote:
> >Ralph Castain wrote:
> >>
> >>> Is a configure-time test good enough?  For example, are all Linuxes
> >>> the same in this regard.  That is if you built OMPI on RH and it
> >>> configured in the new SysV SM will those bits actually run on other
> >>> Linux systems correctly?  I think Jeff had hinted to this similarly
> >>> when suggesting this may need to be a runtime test.
> >>
> >> I don't think we have ever enforced that requirement, nor am I sure
> >> the current code would meet it. We have a number of components that
> >> test for ability to build, but don't check again at run-time.
> >>
> >> Generally, the project has followed the philosophy of "build on the
> >> system you intend to run on".
> >>
> >There is at least one binary distribution that does build on one linux
> >and allows to be installed on several others.  That is the reason I
> >bring up the above.   The community can make a stance that that one
> >distribution does not matter for this case or needs to handle it on its
> >own.  In the grand scheme of things it might not matter but I wanted to
> >at least stand up and be heard.
> 
> There is a gradation involved.  Building on one distribution and using
> on another is one thing.  But the same distribution can use differently
> built kernels, and the same system can be reconfigured (including both
> package updating and parameter changing).  It is highly undesirable to
> use volatile parameters in non-volatile context.
> 
> A lot of applications need rebuilding when the administrator updates
> packages or makes configuration changes; that's not good and should be
> avoided if at all possible.  Given the way that systems are currently
> configured, and the design of the autoconfigure mechanism, it's probably
> not wholly avoidable.  But it's still a very nasty gotcha.
> 
> 
> Regards,
> Nick Maclaren.
> 
> 
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


Reply via email to