On Tue, Oct/16/2007 05:37:18PM, Jeff Squyres wrote: > On Oct 16, 2007, at 5:23 PM, Ethan Mallove wrote: > > > The bail is that "make" will eventually succeed or fail > > with something other than "interrupted system call". Do > > we need another condition? > > I'm just worried that Sun's NFS will somehow get in a > perpetual "interrupted system call" loop such that you'll > never actually break out of it.
How about a counter? E.g., after "x" number of "interrupted system call" messages, MTT moves on. Or should the "command restart" go in DoCommand.pm so we can have a timeout? I also noticed that our build script (which prints hundreds of "interrupted system call" messages per build, but does not seem to die because of them) uses "make -j 24" while MTT has been using "make -j 4". I'll experiment with -j. -Ethan > > > I do not know which system call is getting interrupted, but > > here's an interesting article on how different Unixes deal > > with connect() interruptions: > > > > http://www.madore.org/~david/computers/connect-intr.html > > > > -Ethan > > > > > > On Tue, Oct/16/2007 04:59:29PM, Jeff Squyres wrote: > >> Ick! > >> > >> This is a long-known problem [apparently] with Sun's NFS, > >> unfortunately. :-( > >> > >> I'd be ok with this if there is an eventual bail out of the loop -- > >> the prospect of an infinite loop is a bit scary for me. > >> > >> > >> On Oct 16, 2007, at 11:23 AM, Ethan Mallove wrote: > >> > >>> On certain NFS servers, I run into the error message > >>> "Interrupted system call" when executing long running > >>> commands such as "make all". One solution I've been able to > >>> use is to setup an NFS mount point solely for the cluster > >>> I'm using, but this is not always an option. The below link > >>> advises to restart the build on "Interrupted system call": > >>> > >>> http://developers.sun.com/solaris/articles/parallel_make.html > >>> > >>> I wrapped the GNU_Install.pm make commands in a do-while to > >>> effect the build restarts. E.g., > >>> > >>> do { > >>> $x = MTT::DoCommand::Cmd("make install") > >>> } while (!MTT::DoCommand::wsuccess($x->{exit_status}) and ($x-> > >>> {result_stderr} =~ /interrupted system call/i)); > >>> > >>> As long as make emits "interrupted system call" and fails, > >>> MTT will keep restarting make. > >>> > >>> I realize this is ugly, but is it acceptable? > >>> > >>> -Ethan > >>> _______________________________________________ > >>> mtt-devel mailing list > >>> mtt-de...@open-mpi.org > >>> http://www.open-mpi.org/mailman/listinfo.cgi/mtt-devel > >> > >> > >> -- > >> Jeff Squyres > >> Cisco Systems > >> > >> _______________________________________________ > >> mtt-devel mailing list > >> mtt-de...@open-mpi.org > >> http://www.open-mpi.org/mailman/listinfo.cgi/mtt-devel > > _______________________________________________ > > mtt-devel mailing list > > mtt-de...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/mtt-devel > > > -- > Jeff Squyres > Cisco Systems > > _______________________________________________ > mtt-devel mailing list > mtt-de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/mtt-devel