Hi, I have compiled Moses,Giza & SRILM on Fedora Core 11 using the steps described in http://www.statmt.org/moses_steps.html and other moses support links.
While training my parallel corpus of english and hindi (~100,000 sentences each) I get an error as shown below when i execute: nohup nice ./tools/moses-scripts/scripts-20091002-0031//training/train-factored-phrase-model.perl -scripts-root-dir ./tools/moses-scripts/scripts-20091002-0031/ -root-dir work3 -corpus ./work3/corpus/IRL-clean -f hi2 -e en2 -alignment grow-diag-final-and -reordering msd-bidirectional-fe -lm 0:3:/home/danish/FIRE2010/work3/lm/IRL-en.lm >& ./work3/training.out & In one step of the training process, I get the following error and the tools quits: *Last few lines of output (training.out) :* Use of uninitialized value $a in split at ./tools/moses-scripts/scripts-20091002-0031/training/train-factored-phrase-model.perl line 856. Use of uninitialized value $a in scalar chomp at ./tools/moses-scripts/scripts-20091002-0031/training/train-factored-phrase-model.perl line 853. Use of uninitialized value $a in split at ./tools/moses-scripts/scripts-20091002-0031/training/train-factored-phrase-model.perl line 856. Use of uninitialized value $a in scalar chomp at ./tools/moses-scripts/scripts-20091002-0031/training/train-factored-phrase-model.perl line 853. Use of uninitialized value $a in split at ./tools/moses-scripts/scripts-20091002-0031/training/train-factored-phrase-model.perl line 856. Use of uninitialized value $a in scalar chomp at ./tools/moses-scripts/scripts-20091002-0031/training/train-factored-phrase-model.perl line 853. Use of uninitialized value $a in split at ./tools/moses-scripts/scripts-20091002-0031/training/train-factored-phrase-model.perl line 856. Use of uninitialized value $a in scalar chomp at ./tools/moses-scripts/scripts-20091002-0031/training/train-factored-phrase-model.perl line 853. Use of uninitialized value $a in split at ./tools/moses-scripts/scripts-20091002-0031/training/train-factored-phrase-model.perl line 856. Use of uninitialized value $a in scalar chomp at ./tools/moses-scripts/scripts-20091002-0031/training/train-factored-phrase-model.perl line 853. Use of uninitialized value $a in split at ./tools/moses-scripts/scripts-20091002-0031/training/train-factored-phrase-model.perl line 856. Saved: ./work3//model/lex.f2e and ./work3//model/lex.e2f FILE: ./work3/corpus/IRL-clean.en2 FILE: ./work3/corpus/IRL-clean.hi2 FILE: ./work3//model/aligned.grow-diag-final-and (5) extract phrases @ Sat Oct 3 02:46:00 IST 2009 ./tools/moses-scripts//scripts-20091002-0031//training/phrase-extract/extract ./work3/corpus/IRL-clean.en2 ./work3/corpus/IRL-clean.hi2 ./work3//model/aligned.grow-diag-final-and ./work3//model/extract 7 --NoFileLimit orientation Executing: ./tools/moses-scripts//scripts-20091002-0031//training/phrase-extract/extract ./work3/corpus/IRL-clean.en2 ./work3/corpus/IRL-clean.hi2 ./work3//model/aligned.grow-diag-final-and ./work3//model/extract 7 --NoFileLimit orientation PhraseExtract v1.4, written by Philipp Koehn phrase extraction from an aligned parallel corpus .........Executing: gzip ./work3//model/extract.inv gzip: ./work3//model/extract.inv: No such file or directory Exit code: 1 ERROR at ./tools/moses-scripts/scripts-20091002-0031/training/train-factored-phrase-model.perl line 963. My clean sentence files are with the extension hi2 (for hindi) and en2 (for english). I have tried solutions available on moses support forums for similar problems, but they have not helped. The following is a listing of the files & folders in my work folder (work3) *corpus* folder total 76384 -rw-rw-r--. 1 danish danish 27717737 2009-10-02 23:29 IRL-clean.hi2 -rw-rw-r--. 1 danish danish 11502887 2009-10-02 23:29 IRL-clean.en2 -rw-r--r--. 1 root root 1781671 2009-10-03 17:44 hi2.vcb.classes -rw-r--r--. 1 root root 1579583 2009-10-03 17:44 hi2.vcb.classes.cats -rw-r--r--. 1 root root 704087 2009-10-03 17:50 en2.vcb.classes -rw-r--r--. 1 root root 534277 2009-10-03 17:50 en2.vcb.classes.cats -rw-r--r--. 1 root root 2158362 2009-10-03 17:50 hi2.vcb -rw-r--r--. 1 root root 1013926 2009-10-03 17:50 en2.vcb -rw-r--r--. 1 root root 15605740 2009-10-03 17:50 hi2-en2-int-train.snt -rw-r--r--. 1 root root 15605740 2009-10-03 17:51 en2-hi2-int-train.snt *giza.en2-hi2* folder total 124088 -rw-r--r--. 1 root root 109989326 2009-10-03 18:44 en2-hi2.cooc -rw-r--r--. 1 root root 1651 2009-10-03 18:44 en2-hi2.gizacfg -rw-r--r--. 1 root root 17070807 2009-10-03 19:22 en2-hi2.A3.final.gz *giza.hi2-en2* folder total 124052 -rw-r--r--. 1 root root 110088686 2009-10-03 17:51 hi2-en2.cooc -rw-r--r--. 1 root root 1651 2009-10-03 17:51 hi2-en2.gizacfg -rw-r--r--. 1 root root 16928263 2009-10-03 18:43 hi2-en2.A3.final.gz *lm* folder total 100388 -rw-rw-r--. 1 danish danish 27717737 2009-10-02 23:29 IRL-clean.hi2 -rw-rw-r--. 1 danish danish 11502887 2009-10-02 23:29 IRL-clean.en2 -rw-r--r--. 1 root root 22834140 2009-10-03 17:29 IRL-en.lm -rw-r--r--. 1 root root 40731568 2009-10-03 17:30 IRL-hi.lm *model* folder total 7992 -rw-r--r--. 1 root root 0 2009-10-03 19:23 aligned.grow-diag-final-and -rw-r--r--. 1 root root 4089006 2009-10-03 19:23 lex.f2e -rw-r--r--. 1 root root 4089006 2009-10-03 19:23 lex.e2f I can see the model folder does not contain the extract.inv file which seems to cause the error. I have re-done the steps thrice and face the exact same error each time. I have ensured that the parallel text has been lower cased (for english) and cleaned (english & hindi both). May I request you to kindly help me resolve this issue at the earliest. Thanks! Thank you, Regards, Danish Contractor
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
