Hi,

the problem lies in the word alignment step (step 3) - you can run the step in
isolation to check in more detail about what is going wrong.

One common problem with word alignment is that GIZA++ is sensititive
to bad data, i.e. empty lines, long sentences, or excessive mismatch
in sentence length. The clean-corpus-n.perl script is designed to take
care of these problems. Did you run this on your original corpus?

-phi

On Sun, Oct 4, 2009 at 6:32 AM, Danish Contractor
<[email protected]> wrote:
> Hi,
>
> I have compiled Moses,Giza & SRILM on Fedora Core 11 using the steps
> described in http://www.statmt.org/moses_steps.html and other moses support
> links.
>
> While training my parallel corpus of english and hindi (~100,000 sentences
> each) I get an error as shown below when i execute:
>
> nohup nice
> ./tools/moses-scripts/scripts-20091002-0031//training/train-factored-phrase-model.perl
> -scripts-root-dir ./tools/moses-scripts/scripts-20091002-0031/ -root-dir
> work3 -corpus ./work3/corpus/IRL-clean -f hi2 -e en2 -alignment
> grow-diag-final-and -reordering msd-bidirectional-fe -lm
> 0:3:/home/danish/FIRE2010/work3/lm/IRL-en.lm >& ./work3/training.out &
>
> In one step of the training process, I get the following error and the tools
> quits:
>
> Last few lines of output (training.out) :
>
> Use of uninitialized value $a in split at
> ./tools/moses-scripts/scripts-20091002-0031/training/train-factored-phrase-model.perl
> line 856.
> Use of uninitialized value $a in scalar chomp at
> ./tools/moses-scripts/scripts-20091002-0031/training/train-factored-phrase-model.perl
> line 853.
> Use of uninitialized value $a in split at
> ./tools/moses-scripts/scripts-20091002-0031/training/train-factored-phrase-model.perl
> line 856.
> Use of uninitialized value $a in scalar chomp at
> ./tools/moses-scripts/scripts-20091002-0031/training/train-factored-phrase-model.perl
> line 853.
> Use of uninitialized value $a in split at
> ./tools/moses-scripts/scripts-20091002-0031/training/train-factored-phrase-model.perl
> line 856.
> Use of uninitialized value $a in scalar chomp at
> ./tools/moses-scripts/scripts-20091002-0031/training/train-factored-phrase-model.perl
> line 853.
> Use of uninitialized value $a in split at
> ./tools/moses-scripts/scripts-20091002-0031/training/train-factored-phrase-model.perl
> line 856.
> Use of uninitialized value $a in scalar chomp at
> ./tools/moses-scripts/scripts-20091002-0031/training/train-factored-phrase-model.perl
> line 853.
> Use of uninitialized value $a in split at
> ./tools/moses-scripts/scripts-20091002-0031/training/train-factored-phrase-model.perl
> line 856.
> Use of uninitialized value $a in scalar chomp at
> ./tools/moses-scripts/scripts-20091002-0031/training/train-factored-phrase-model.perl
> line 853.
> Use of uninitialized value $a in split at
> ./tools/moses-scripts/scripts-20091002-0031/training/train-factored-phrase-model.perl
> line 856.
>
> Saved: ./work3//model/lex.f2e and ./work3//model/lex.e2f
> FILE: ./work3/corpus/IRL-clean.en2
> FILE: ./work3/corpus/IRL-clean.hi2
> FILE: ./work3//model/aligned.grow-diag-final-and
> (5) extract phrases @ Sat Oct  3 02:46:00 IST 2009
> ./tools/moses-scripts//scripts-20091002-0031//training/phrase-extract/extract
> ./work3/corpus/IRL-clean.en2 ./work3/corpus/IRL-clean.hi2
> ./work3//model/aligned.grow-diag-final-and ./work3//model/extract 7
> --NoFileLimit orientation
> Executing:
> ./tools/moses-scripts//scripts-20091002-0031//training/phrase-extract/extract
> ./work3/corpus/IRL-clean.en2 ./work3/corpus/IRL-clean.hi2
> ./work3//model/aligned.grow-diag-final-and ./work3//model/extract 7
> --NoFileLimit orientation
> PhraseExtract v1.4, written by Philipp Koehn
> phrase extraction from an aligned parallel corpus
> .........Executing: gzip ./work3//model/extract.inv
> gzip: ./work3//model/extract.inv: No such file or directory
> Exit code: 1
> ERROR at
> ./tools/moses-scripts/scripts-20091002-0031/training/train-factored-phrase-model.perl
> line 963.
>
>
> My clean sentence files are with the extension hi2 (for hindi) and en2 (for
> english).
> I have tried solutions available on moses support forums for similar
> problems, but they have not helped.
>
> The following is a listing of the files & folders in my work folder (work3)
>
> corpus folder
> total 76384
> -rw-rw-r--. 1 danish danish 27717737 2009-10-02 23:29 IRL-clean.hi2
> -rw-rw-r--. 1 danish danish 11502887 2009-10-02 23:29 IRL-clean.en2
> -rw-r--r--. 1 root   root    1781671 2009-10-03 17:44 hi2.vcb.classes
> -rw-r--r--. 1 root   root    1579583 2009-10-03 17:44 hi2.vcb.classes.cats
> -rw-r--r--. 1 root   root     704087 2009-10-03 17:50 en2.vcb.classes
> -rw-r--r--. 1 root   root     534277 2009-10-03 17:50 en2.vcb.classes.cats
> -rw-r--r--. 1 root   root    2158362 2009-10-03 17:50 hi2.vcb
> -rw-r--r--. 1 root   root    1013926 2009-10-03 17:50 en2.vcb
> -rw-r--r--. 1 root   root   15605740 2009-10-03 17:50 hi2-en2-int-train.snt
> -rw-r--r--. 1 root   root   15605740 2009-10-03 17:51 en2-hi2-int-train.snt
>
> giza.en2-hi2 folder
> total 124088
> -rw-r--r--. 1 root root 109989326 2009-10-03 18:44 en2-hi2.cooc
> -rw-r--r--. 1 root root      1651 2009-10-03 18:44 en2-hi2.gizacfg
> -rw-r--r--. 1 root root  17070807 2009-10-03 19:22 en2-hi2.A3.final.gz
>
> giza.hi2-en2 folder
> total 124052
> -rw-r--r--. 1 root root 110088686 2009-10-03 17:51 hi2-en2.cooc
> -rw-r--r--. 1 root root      1651 2009-10-03 17:51 hi2-en2.gizacfg
> -rw-r--r--. 1 root root  16928263 2009-10-03 18:43 hi2-en2.A3.final.gz
>
> lm folder
> total 100388
> -rw-rw-r--. 1 danish danish 27717737 2009-10-02 23:29 IRL-clean.hi2
> -rw-rw-r--. 1 danish danish 11502887 2009-10-02 23:29 IRL-clean.en2
> -rw-r--r--. 1 root   root   22834140 2009-10-03 17:29 IRL-en.lm
> -rw-r--r--. 1 root   root   40731568 2009-10-03 17:30 IRL-hi.lm
>
>  model folder
> total 7992
> -rw-r--r--. 1 root root       0 2009-10-03 19:23 aligned.grow-diag-final-and
> -rw-r--r--. 1 root root 4089006 2009-10-03 19:23 lex.f2e
> -rw-r--r--. 1 root root 4089006 2009-10-03 19:23 lex.e2f
>
> I can see the model folder does not contain the extract.inv file which seems
> to cause the error. I have re-done the steps thrice and face the exact same
> error each time.
>
> I have ensured that the parallel text has been lower cased (for english) and
> cleaned (english & hindi both).
> May I request you to kindly help me resolve this issue at the earliest.
> Thanks!
>
> Thank you,
> Regards,
>
> Danish Contractor
>
>
>
>
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to