Hi Jelita

It could be running out of memory. Under cygwin, mgiza will be limited 
to 2GB
http://www.statmt.org/moses/?n=Moses.FAQ#ntoc9

cheers - Barry

On 06/11/12 07:33, Jelita Asian wrote:
> Hi,
>
> I run Moses training using moses-for-mere-mortal scripts. The run is 
> used to be OK. However, since I increase the number of words (mostly 
> numbers written in words where words act as parallel sentences in 
> corpus for Indonesian and English), I keep getting mgiza stack-dump, 
> hence the training is failed.
>
> Here is the extract for the log file of the run:
>
> -----------
> Model1: Iteration 5
> Reading more sentence pairs into memory ...
> [sent:100000]
> Reading more sentence pairs into memory ...
> Reading more sentence pairs into memory ...
> Reading more sentence pairs into memory ...
> Model1: (5) TRAIN CROSS-ENTROPY 5.82453 PERPLEXITY 56.6706
> Model1: (5) VITERBI TRAIN CROSS-ENTROPY 6.59753 PERPLEXITY 96.8401
> Model 1 Iteration: 5 took: 87 seconds
> Entire Model1 Training took: 444 seconds
> NOTE: I am doing iterations with the HMM model!
> Read classes: #words: 48562  #classes: 51
> Actual number of read words: 48561 stored words: 48561
> Read classes: #words: 45484  #classes: 51
> Actual number of read words: 45483 stored words: 45483
>
> ==========================================================
> Hmm Training Started at: Tue Nov  6 12:46:41 2012
>
> ./train-AllCorpusIndo.sh: line 1184:  3936 Aborted                 
> (core dumped) $toolsdir/mgiza/bin/mgiza -ncpus $mgizanumprocessors -c 
> $modeldir/$lang2-$lang1-int-train.snt -o $modeldir/$lang2-$lang1 -s 
> $modeldir/$lang1.vcb -t $modeldir/$lang2.vcb -coocurrencefile 
> $modeldir/$lang1-$lang2.cooc -ml $ml -countincreasecutoff 
> $countincreasecutoff -countincreasecutoffal $countincreasecutoffal 
> -mincountincrease $mincountincrease -peggedcutoff $peggedcutoff 
> -probcutoff $probcutoff -probsmooth $probsmooth -m1 $model1iterations 
> -m2 $model2iterations -mh $hmmiterations -m3 $model3iterations -m4 
> $model4iterations -m5 $model5iterations -m6 $model6iterations -t1 
> $model1dumpfrequency -t2 $model2dumpfrequency -t2to3 
> $transferdumpfrequency -t345 $model345dumpfrequency -th 
> $hmmdumpfrequency -onlyaldumps $onlyaldumps -nodumps $nodumps 
> -compactadtable $compactadtable -model4smoothfactor 
> $model4smoothfactor -compactalignmentformat $compactalignmentformat 
> -verbose $verbose -verbosesentence $verbosesentence -emalsmooth 
> $emalsmooth -model23smoothfactor $model23smoothfactor 
> -model4smoothfactor $model4smoothfactor -model5smoothfactor 
> $model5smoothfactor -nsmooth $nsmooth -nsmoothgeneral $nsmoothgeneral 
> -deficientdistortionforemptyword $deficientdistortionforemptyword 
> -depm4 $depm4 -depm5 $depm5 -emalignmentdependencies 
> $emalignmentdependencies -emprobforempty $emprobforempty -m5p0 $m5p0 
> -manlexfactor1 $manlexfactor1 -manlexfactor2 $manlexfactor2 
> -manlexmaxmultiplicity $manlexmaxmultiplicity -maxfertility 
> $maxfertility -p0 $p0 -pegging $pegging
> Starting MGIZA
> Initializing Global Paras
> DEBUG: EnterDEBUG: PrefixDEBUG: LogParsing Arguments
> Parameter 'ncpus' changed from '2' to '8'
> Parameter 'c' changed from '' to 
> '/home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/id-en-int-train.snt'
> Parameter 'o' changed from '112-11-06.124815.Jelita' to 
> '/home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/id-en'
> Parameter 's' changed from '' to 
> '/home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/en.vcb'
> Parameter 't' changed from '' to 
> '/home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/id.vcb'
> Parameter 'coocurrencefile' changed from '' to 
> '/home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/en-id.cooc'
> Parameter 'm3' changed from '5' to '3'
> Parameter 'm4' changed from '5' to '3'
> Parameter 'onlyaldumps' changed from '0' to '1'
> Parameter 'nodumps' changed from '0' to '1'
> Parameter 'model4smoothfactor' changed from '0.2' to '0.4'
> Parameter 'nsmooth' changed from '64' to '4'
> Parameter 'p0' changed from '-1' to '0.999'
> general parameters:
> -------------------
> ml = 101  (maximum sentence length)
>
> Here is another extract
>
> ./train-AllCorpusIndo.sh: line 1184:  2756 Aborted                 
> (core dumped) $toolsdir/mgiza/bin/mgiza -ncpus $mgizanumprocessors -c 
> $modeldir/$lang1-$lang2-int-train.snt -o $modeldir/$lang1-$lang2 -s 
> $modeldir/$lang2.vcb -t $modeldir/$lang1.vcb -coocurrencefile 
> $modeldir/$lang2-$lang1.cooc -ml $ml -countincreasecutoff 
> $countincreasecutoff -countincreasecutoffal $countincreasecutoffal 
> -mincountincrease $mincountincrease -peggedcutoff $peggedcutoff 
> -probcutoff $probcutoff -probsmooth $probsmooth -m1 $model1iterations 
> -m2 $model2iterations -mh $hmmiterations -m3 $model3iterations -m4 
> $model4iterations -m5 $model5iterations -m6 $model6iterations -t1 
> $model1dumpfrequency -t2 $model2dumpfrequency -t2to3 
> $transferdumpfrequency -t345 $model345dumpfrequency -th 
> $hmmdumpfrequency -onlyaldumps $onlyaldumps -nodumps $nodumps 
> -compactadtable $compactadtable -model4smoothfactor 
> $model4smoothfactor -compactalignmentformat $compactalignmentformat 
> -verbose $verbose -verbosesentence $verbosesentence -emalsmooth 
> $emalsmooth -model23smoothfactor $model23smoothfactor 
> -model4smoothfactor $model4smoothfactor -model5smoothfactor 
> $model5smoothfactor -nsmooth $nsmooth -nsmoothgeneral $nsmoothgeneral 
> -deficientdistortionforemptyword $deficientdistortionforemptyword 
> -depm4 $depm4 -depm5 $depm5 -emalignmentdependencies 
> $emalignmentdependencies -emprobforempty $emprobforempty -m5p0 $m5p0 
> -manlexfactor1 $manlexfactor1 -manlexfactor2 $manlexfactor2 
> -manlexmaxmultiplicity $manlexmaxmultiplicity -maxfertility 
> $maxfertility -p0 $p0 -pegging $pegging
> ****** phase 2.1 of training (merge alignments)
> Traceback (most recent call last):
>   File "/home/Jelita/moses/tools/mgiza/scripts/merge_alignment.py", 
> line 24, in <module>
>     files.append(open(sys.argv[i],"r"));
> IOError: [Errno 2] No such file or directory: 
> '/home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/id-en.A3.final.part*'
> Traceback (most recent call last):
>   File "/home/Jelita/moses/tools/mgiza/scripts/merge_alignment.py", 
> line 24, in <module>
>     files.append(open(sys.argv[i],"r"));
> IOError: [Errno 2] No such file or directory: 
> '/home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/en-id.A3.final.part*'
> ****** Rest of parallel training
> Using SCRIPTS_ROOTDIR: /home/Jelita/moses/tools/moses/scripts
> Using single-thread GIZA
> (3) generate word alignment @ Tue Nov  6 13:07:31 SEAST 2012
> Combining forward and inverted alignment from files:
>   
> /home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/id-en.A3.final.{bz2,gz}
>   
> /home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/en-id.A3.final.{bz2,gz}
> Executing: mkdir -p 
> /home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6
> Executing: 
> /home/Jelita/moses/tools/moses/scripts/training/symal/giza2bal.pl 
> <http://giza2bal.pl> -d "gzip -cd 
> /home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/en-id.A3.final.gz"
>  
> -i "gzip -cd 
> /home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/id-en.A3.final.gz"
>  
> |/home/Jelita/moses/tools/moses/scripts/training/symal/symal 
> -alignment="grow" -diagonal="yes" -final="yes" -both="yes" > 
> /home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/aligned.grow-diag-final-and
> symal: computing grow alignment: diagonal (1) final (1)both-uncovered (1)
> skip=<0> counts=<0>
> (4) generate lexical translation table 0-0 @ Tue Nov  6 13:07:31 SEAST 
> 2012
> (/home/Jelita/moses/corpora_trained/lc_clean/MinLen-1.MaxLen-60/CleanAllCorpus15Oct2012.for_train.lowercase.id
>  
> <http://CleanAllCorpus15Oct2012.for_train.lowercase.id>,/home/Jelita/moses/corpora_trained/lc_clean/MinLen-1.MaxLen-60/CleanAllCorpus15Oct2012.for_train.lowercase.en,/home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/lex)
> !Use of uninitialized value $a in scalar chomp at 
> /home/Jelita/moses/tools/moses/scripts/training/train-model.perl line 
> 1079.
> Use of uninitialized value $a in split at 
> /home/Jelita/moses/tools/moses/scripts/training/train-model.perl line 
> 1082.
>
> What is the cause? I use cygwin for Windows 7 on a 64-bit machine.
> I ran a few times and it can't get pass the Model 1 training.
>
> Thanks.
>
> Best regards,
>
> Jelita
>
>
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to