On Tue, Oct/16/2007 05:37:18PM, Jeff Squyres wrote:
> On Oct 16, 2007, at 5:23 PM, Ethan Mallove wrote:
> 
> > The bail is that "make" will eventually succeed or fail
> > with something other than "interrupted system call". Do
> > we need another condition?
> 
> I'm just worried that Sun's NFS will somehow get in a
> perpetual "interrupted system call" loop such that you'll
> never actually break out of it.


How about a counter? E.g., after "x" number of "interrupted
system call" messages, MTT moves on. Or should the "command
restart" go in DoCommand.pm so we can have a timeout?

I also noticed that our build script (which prints hundreds
of "interrupted system call" messages per build, but does
not seem to die because of them) uses "make -j 24" while MTT
has been using "make -j 4". I'll experiment with -j.

-Ethan


> 
> > I do not know which system call is getting interrupted, but
> > here's an interesting article on how different Unixes deal
> > with connect() interruptions:
> >
> >   http://www.madore.org/~david/computers/connect-intr.html
> >
> > -Ethan
> >
> >
> > On Tue, Oct/16/2007 04:59:29PM, Jeff Squyres wrote:
> >> Ick!
> >>
> >> This is a long-known problem [apparently] with Sun's NFS,
> >> unfortunately.  :-(
> >>
> >> I'd be ok with this if there is an eventual bail out of the loop --
> >> the prospect of an infinite loop is a bit scary for me.
> >>
> >>
> >> On Oct 16, 2007, at 11:23 AM, Ethan Mallove wrote:
> >>
> >>> On certain NFS servers, I run into the error message
> >>> "Interrupted system call" when executing long running
> >>> commands such as "make all". One solution I've been able to
> >>> use is to setup an NFS mount point solely for the cluster
> >>> I'm using, but this is not always an option. The below link
> >>> advises to restart the build on "Interrupted system call":
> >>>
> >>>   http://developers.sun.com/solaris/articles/parallel_make.html
> >>>
> >>> I wrapped the GNU_Install.pm make commands in a do-while to
> >>> effect the build restarts. E.g.,
> >>>
> >>>   do {
> >>>       $x = MTT::DoCommand::Cmd("make install")
> >>>   } while (!MTT::DoCommand::wsuccess($x->{exit_status}) and ($x->
> >>> {result_stderr} =~ /interrupted system call/i));
> >>>
> >>> As long as make emits "interrupted system call" and fails,
> >>> MTT will keep restarting make.
> >>>
> >>> I realize this is ugly, but is it acceptable?
> >>>
> >>> -Ethan
> >>> _______________________________________________
> >>> mtt-devel mailing list
> >>> mtt-de...@open-mpi.org
> >>> http://www.open-mpi.org/mailman/listinfo.cgi/mtt-devel
> >>
> >>
> >> -- 
> >> Jeff Squyres
> >> Cisco Systems
> >>
> >> _______________________________________________
> >> mtt-devel mailing list
> >> mtt-de...@open-mpi.org
> >> http://www.open-mpi.org/mailman/listinfo.cgi/mtt-devel
> > _______________________________________________
> > mtt-devel mailing list
> > mtt-de...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/mtt-devel
> 
> 
> -- 
> Jeff Squyres
> Cisco Systems
> 
> _______________________________________________
> mtt-devel mailing list
> mtt-de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/mtt-devel

Reply via email to