I have run a new training after reshuffling the corpus. This time the .gz files are not missing, and the training has continued up to step 9. However, if I use the final moses.ini to decode, things are quite bad: most words are treated as unknowns (even though they are in the phrase-table) and moreover the decoding terminates sometimes with the following error
terminate called after throwing an instance of 'util::Exception' what(): moses/IOWrapper.cpp:273 in std::map<long unsigned int, const Moses::Factor*> Moses::IOWrapper::GetPlaceholders(const Moses::Hypothesis&, Moses::FactorType) threw util::Exception because `targetPos.size() != 1'. Placeholder should be aligned to 1, and only 1, word even though I did use the option -extract-options '--Placeholders @num@,@rom@,@alpha@' when launching train-model.perl. 2015-02-24 11:53 GMT+01:00 Tom Hoar <[email protected]>: > My experience is these warnings are non-fatal. I.e. they do not typically > cause missing .gz word alignment files. Rather, they indicate a high degree > of complexity in the parallel corpus. > > Are you still getting zero-length (20 bytes) .gz alignment files? > > > > On 02/24/2015 04:35 PM, Vito Mandorino wrote: > > Thank you, I did have indeed a bad ratio in a previous try but not in > this one. I have launched the clean-corpus.perl script right before the > training. > In the log output there are actually several lines with warnings such as: > > PROBLEM: alignment is 0. > WARNING: Hill Climbing yielded a zero score viterbi alignment for the > following pair: > WARNING: Model2 viterbi alignment has zero score. > Fert[50] selected WARNING: Model2 viterbi alignment has zero score. > WARNING: already 41 iterations in hillclimb: 3.24683 2 26 29 > WARNING: DIFFERENT SUMS: (1) (3.13379) > > Vito > > > 2015-02-20 19:18 GMT+01:00 Tom Hoar <[email protected]> > : > >> Fatal errors during step 2 are normally traceable to poor corpus >> preparation. Termination, however, does not always happen immediately. Look >> through the entire log output. You'll probably find one of these errors: >> >> "WARNING: The following sentence pair has source/target sentence length >> ration more than" >> >> or "ERROR: Forbidden zero sentence length 0" >> >> or a line beginning with "ERROR:" >> >> The fact that your corpus has placeholders makes me suspect you probably >> have a bad ratio. >> >> >> >> >> >> On 02/20/2015 06:18 PM, Vito Mandorino wrote: >> >> Dear All, >> >> I am training a model with placeholders from French to English and the >> process ends before the end of training step 2, without creating the >> en-fr.AR.final.gz and fr-en.AR.final.gz files in the respective >> folders giza.en-fr and giza.fr-en . >> >> I cannot understand why. Here's the last 7 lines of the training.out >> file: >> >> >> Entire Viterbi H333444 Training took: 78424 seconds >> ========================================================== >> >> Entire Training took: 134751 seconds >> Program Finished at: Fri Feb 20 05:45:19 2015 >> >> ========================================================== >> >> >> and here's the command used for training: >> >> nohup nice /home/Moses/mosesdecoder/scripts/training/train-model.perl \ >> --parallel \ >> -mgiza -mgiza-cpus 20 \ >> -root-dir /home/BRIQUES/train-leclerc-light-ph-fren/training \ >> -external-bin-dir /root/external-bin-dir/ \ >> -corpus >> /home/BRIQUES/train-leclerc-light-ph-enfr/data/Corpus.leclerc-ph.cleanclean >> -f fr -e en \ >> -alignment grow-diag-final-and -reordering msd-bidirectional-fe \ >> -lm 0:5:/home/DATA/LM/RapAnSoc/lm.RapAnSoc5-ph.blm.en.mm:8 \ >> -write-lexical-counts \ >> -extract-options '--Placeholders @num@,@rom@,@alpha@' \ >> >& /home/BRIQUES/train-leclerc-light-ph-fren/training/training.out & >> >> >> It seems a bit odd to me that the analogous training in the reversed >> English-French direction has worked nicely. >> In fact, in this case the folders giza.en-fr and giza.fr-en do contain >> not only the .gz files but also two more "chunks" en-fr.A3.final.partD and >> en-fr.A3.final.partE, whereas the French-English training stops at >> fr-en.A3.final.partC. >> >> >> Thank you, >> >> Vito Mandorino >> >> -- >> >> >> [image: Description : Description : lingua_custodia_final full logo] >> >> *The Translation Trustee* >> >> *1, Place Charles de Gaulle, **78180 Montigny-le-Bretonneux* >> >> *Tel : +33 1 30 44 04 23 Mobile : +33 6 84 65 68 89 >> <%2B33%206%2084%2065%2068%2089>* >> >> *Email :* *[email protected] >> <[email protected]>* >> >> *Website :* *www.linguacustodia.com <http://www.linguacustodia.com/> - >> www.thetranslationtrustee.com <http://www.thetranslationtrustee.com/>* >> >> >> _______________________________________________ >> Moses-support mailing >> [email protected]http://mailman.mit.edu/mailman/listinfo/moses-support >> >> >> >> _______________________________________________ >> Moses-support mailing list >> [email protected] >> http://mailman.mit.edu/mailman/listinfo/moses-support >> >> > > > -- > *M**. Vito MANDORINO -- Chief Scientist* > > > [image: Description : Description : lingua_custodia_final full logo] > > *The Translation Trustee* > > *1, Place Charles de Gaulle, **78180 Montigny-le-Bretonneux* > > *Tel : +33 1 30 44 04 23 Mobile : +33 6 84 65 68 89 > <%2B33%206%2084%2065%2068%2089>* > > *Email :* *[email protected] > <[email protected]>* > > *Website :* *www.linguacustodia.com <http://www.linguacustodia.com/> - > www.thetranslationtrustee.com <http://www.thetranslationtrustee.com/>* > > > _______________________________________________ > Moses-support mailing > [email protected]http://mailman.mit.edu/mailman/listinfo/moses-support > > > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support > > -- *M**. Vito MANDORINO -- Chief Scientist* [image: Description : Description : lingua_custodia_final full logo] *The Translation Trustee* *1, Place Charles de Gaulle, **78180 Montigny-le-Bretonneux* *Tel : +33 1 30 44 04 23 Mobile : +33 6 84 65 68 89 <%2B33%206%2084%2065%2068%2089>* *Email :* *[email protected] <[email protected]>* *Website :* *www.linguacustodia.com <http://www.linguacustodia.com/> - www.thetranslationtrustee.com <http://www.thetranslationtrustee.com/>*
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
