a better setup would be to have a loop which did the following: --for a given version number and step, check for STDERR, STDOUT and DONE --if they are all found, exit --otherwise sleep and recheck
(and put some limit overall to prevent an endless loop) Miles On 2 September 2010 11:16, Hieu Hoang <[email protected]> wrote: > sounds like a bad case of a network file system. you prob need to > harass your sysadmin and try a few of these too > http://fixunix.com/nfs/61890-forcing-nfs-sync.html > > On 02/09/2010 04:09, Suzy Howlett wrote: >> Hi everyone, >> >> I'm running Moses through its experiment management system across a >> cluster and I'm finding that sometimes jobs will finish successfully but >> the .STDERR and .STDOUT files will be slow in appearing relative to the >> .DONE file, meaning that the EMS concludes that the step crashed. I can >> run the system again and it successfully reuses the results of the step >> (it doesn't have to rerun the step) but this is becoming frustrating as >> I have to restart the system >> frequently. I tried adding a call to sleep() in the check_if_crashed() >> method in experiment.perl but this is not helping in general - I think >> sometimes the delay is as much as a couple of minutes. >> >> Has anyone else faced this problem, or have a better idea for how to get >> around it? >> >> Cheers, >> Suzy >> > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support > -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
