Hello!
I am trying to establish a working version of Moses for the purposes of our 
project. I followed guidelines from the Moses Web pages (Baseline System, ...) 
and it was mostly successful, except for the usage of mgiza.

History of what I did:
My System: virtual machine with Ubuntu 14.04 x64, 2 cores, 12 GB of memory.
1. installed release 3.0 from Web page
   tried with commands from "Baseline System" ==> mgiza fails with signal 11, 
coredump
2. compiled and installed latest version of mgiza from Github
   tried with commands from "Baseline System" ==> mgiza fails with signal 11, 
coredump
3. compiled and installed latest version of GIZA++ from Github
   tried with commands from "Baseline System" ==> all OK
4. compiled and installed latest version of moses, GIZA++ and mgiza from Github
   tried with commands from "Baseline System" ==> OK with GIZA++, fail with 
mgiza

Basically, for calling GIZA++/mgiza I use the same command with the same input 
files, the only difference is the following two switches:

GIZAOPT="-mgiza -mgiza-cpus 2"

Command:
$HOME/mosesdecoder/scripts/training/train-model.perl -cores 2 $GIZAOPT 
-root-dir train -corpus $HOME/corpus/news-commentary-v8.fr-en.clean -f fr -e en 
-alignment grow-diag-final-and -reordering msd-bidirectional-fe -lm 
0:3:$HOME/lm/news-commentary-v8.fr-en.blm.en:8 -external-bin-dir 
$HOME/mosesdecoder/training-tools 2>&1 > train.out

If GIZA++ is called (when GIZAOPT=""), all is OK, when mgiza is called (when 
GIZAOP="-mgiza ..."), mgiza fails with:

Executing: $HOME/mosesdecoder/training-tools/mgiza  -CoocurrenceFile 
$HOME/tm/train/giza.fr-en/fr-en.cooc -c 
$HOME/tm/train/corpus/fr-en-int-train.snt -m1 5 -m2 0 -m3 3 -m4 3 
-model1dumpfrequency 1 -model4smoothfactor 0.4 -ncpus 2 -nodumps 1 -nsmooth 4 
-o $HOME/tm/train/giza.fr-en/fr-en -onlyaldumps 1 -p0 0.999 -s 
$HOME/tm/train/corpus/en.vcb -t $HOME/tm/train/corpus/fr.vcb
Starting MGIZA
Initializing Global Paras
DEBUG: EnterERROR: Execution of: $HOME/mosesdecoder/training-tools/mgiza  
-CoocurrenceFile $HOME/tm/train/giza.fr-en/fr-en.cooc -c 
$HOME/tm/train/corpus/fr-en-int-train.snt -m1 5 -m2 0 -m3 3 -m4 3 
-model1dumpfrequency 1 -model4smoothfactor 0.4 -ncpus 2 -nodumps 1 -nsmooth 4 
-o $HOME/tm/train/giza.fr-en/fr-en -onlyaldumps 1 -p0 0.999 -s 
$HOME/tm/train/corpus/en.vcb -t $HOME/tm/train/corpus/fr.vcb
  died with signal 11, with coredump

GIZA++ on the other hand works as follows:

Executing: $HOME/mosesdecoder/training-tools/GIZA++  -CoocurrenceFile 
$HOME/tm/train/giza.fr-en/fr-en.cooc -c 
$HOME/tm/train/corpus/fr-en-int-train.snt -m1 5 -m2 0 -m3 3 -m4 3 
-model1dumpfrequency 1 -model4smoothfactor 0.4 -nodumps 1 -nsmooth 4 -o 
$HOME/tm/train/giza.fr-en/fr-en -onlyaldumps 1 -p0 0.999 -s 
$HOME/tm/train/corpus/en.vcb -t $HOME/tm/train/corpus/fr.vcb
Reading vocabulary file from:$HOME/tm/train/corpus/en.vcb
Reading vocabulary file from:$HOME/tm/train/corpus/fr.vcb
10000
20000
...

What can I do to help determine where mgiza fails and get it up & running?
Sub-question: is it really worth running mgiza instead of GIZA++?

Best regards,
  Matjaz

PS: I changed /home/... to $HOME in the above examples.
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to