hi suzy this seems quite serious. Do you know if this occurs frequently? Are your files on NFS?
There have been some similar problems reported when running on NFS because of filesystem delay. However, they usually cause errors which can be seen and dealt quickly with, eg. by putting in extra waits or explict checking for files. Your error isn't detected unless you trawl through the mert iterations. The need to rerun tuning is just to counter the random variability in mert but this would be a definite bug that needs fixing On 06/12/2011 07:23, Suzy Howlett wrote: > Hi all, > > I recently found a problem in an old run of a system, which I didn't > pick up at the time because it failed silently. I'm sending this in the > hope that someone else can learn from my mistake (and in case anyone has > a suggestion for how best to catch it in future). > > I was running the system on subversion repository revision 3590, using > the EMS, across a cluster. The cluster uses Torque rather than SGE so > the qsub commands are slightly different. In particular I have to use > the -old-sge flags. > > During tuning, decoding was split into 10 parts. During run 6, it seems > that one of the ten "best100" files was slow in appearing, and was not > incorporated into run6.best100.out. These translations were then not > available for the next round of tuning, leading the system to converge > to a completely different (worse) point. The 1-best translations were > all there, however, so no error was recorded. > > If anyone needed any more convincing that you need to run your systems > more than once, let this be an example. > > My best guess at a check for this is that the > scripts/generic/moses-parallel.pl check_translation_old_sge method needs > to check that the n-best files have appeared for each split. Does this > sound right, or is there a better place for the check? (I haven't been > following updates to Moses for a little while, so if this is all made > redundant by recent changes, my apologies.) > > Suzy > _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
