a better setup would be to have a loop which did the following:

--for a given version number and step, check for STDERR, STDOUT and DONE
--if they are all found, exit
--otherwise sleep and recheck

(and put some limit overall to prevent an endless loop)

Miles

On 2 September 2010 11:16, Hieu Hoang <[email protected]> wrote:
>  sounds like a bad case of a network file system. you prob need to
> harass your sysadmin and try a few of these too
>    http://fixunix.com/nfs/61890-forcing-nfs-sync.html
>
> On 02/09/2010 04:09, Suzy Howlett wrote:
>> Hi everyone,
>>
>> I'm running Moses through its experiment management system across a
>> cluster and I'm finding that sometimes jobs will finish successfully but
>> the .STDERR and .STDOUT files will be slow in appearing relative to the
>> .DONE file, meaning that the EMS concludes that the step crashed. I can
>> run the system again and it successfully reuses the results of the step
>> (it doesn't have to rerun the step) but this is becoming frustrating as
>> I have to restart the system
>> frequently. I tried adding a call to sleep() in the check_if_crashed()
>> method in experiment.perl but this is not helping in general - I think
>> sometimes the delay is as much as a couple of minutes.
>>
>> Has anyone else faced this problem, or have a better idea for how to get
>> around it?
>>
>> Cheers,
>> Suzy
>>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to