On Wed, Oct/17/2007 07:45:53AM, Jeff Squyres wrote:
> On Oct 16, 2007, at 6:36 PM, Ethan Mallove wrote:
> 
> >>> The bail is that "make" will eventually succeed or fail
> >>> with something other than "interrupted system call". Do
> >>> we need another condition?
> >>
> >> I'm just worried that Sun's NFS will somehow get in a
> >> perpetual "interrupted system call" loop such that you'll
> >> never actually break out of it.
> >
> > How about a counter? E.g., after "x" number of "interrupted
> > system call" messages, MTT moves on. Or should the "command
> > restart" go in DoCommand.pm so we can have a timeout?
>
> Either or both of those would be fine (don't we have a timeout in  
> DoCommand.pm already?).

There is a timeout in DoCommand, but currently I keep
reinvoking DoCommand on each "interrupted system call" so
the timeout gets reset each time. This would not be the case
if the do-while were to go in DoCommand. Then again, an
infinite loop is certain in the case of a command that is
*expected* to output "interrupted system call".

-Ethan

> 
> > I also noticed that our build script (which prints hundreds
> > of "interrupted system call" messages per build, but does
> > not seem to die because of them) uses "make -j 24" while MTT
> > has been using "make -j 4". I'll experiment with -j.
> 
> I know that Terry/Sun and co. spent a good amount of time trying to  
> solve the "interrupted system call" error -- they may have some more  
> information for you, such as how/why it happens...?
> 
> -- 
> Jeff Squyres
> Cisco Systems
> 
> _______________________________________________
> mtt-devel mailing list
> mtt-de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/mtt-devel

Reply via email to