Hi Barry, Thanks. I will look into it now.
Cheers, Jelita On Thu, Nov 8, 2012 at 10:09 PM, Barry Haddow <[email protected]>wrote: > Hi Jelita > > It could be running out of memory. Under cygwin, mgiza will be limited to > 2GB > http://www.statmt.org/moses/?**n=Moses.FAQ#ntoc9<http://www.statmt.org/moses/?n=Moses.FAQ#ntoc9> > > cheers - Barry > > > On 06/11/12 07:33, Jelita Asian wrote: > >> Hi, >> >> I run Moses training using moses-for-mere-mortal scripts. The run is used >> to be OK. However, since I increase the number of words (mostly numbers >> written in words where words act as parallel sentences in corpus for >> Indonesian and English), I keep getting mgiza stack-dump, hence the >> training is failed. >> >> Here is the extract for the log file of the run: >> >> ----------- >> Model1: Iteration 5 >> Reading more sentence pairs into memory ... >> [sent:100000] >> Reading more sentence pairs into memory ... >> Reading more sentence pairs into memory ... >> Reading more sentence pairs into memory ... >> Model1: (5) TRAIN CROSS-ENTROPY 5.82453 PERPLEXITY 56.6706 >> Model1: (5) VITERBI TRAIN CROSS-ENTROPY 6.59753 PERPLEXITY 96.8401 >> Model 1 Iteration: 5 took: 87 seconds >> Entire Model1 Training took: 444 seconds >> NOTE: I am doing iterations with the HMM model! >> Read classes: #words: 48562 #classes: 51 >> Actual number of read words: 48561 stored words: 48561 >> Read classes: #words: 45484 #classes: 51 >> Actual number of read words: 45483 stored words: 45483 >> >> ==============================**============================ >> Hmm Training Started at: Tue Nov 6 12:46:41 2012 >> >> ./train-AllCorpusIndo.sh: line 1184: 3936 Aborted (core >> dumped) $toolsdir/mgiza/bin/mgiza -ncpus $mgizanumprocessors -c >> $modeldir/$lang2-$lang1-int-**train.snt -o $modeldir/$lang2-$lang1 -s >> $modeldir/$lang1.vcb -t $modeldir/$lang2.vcb -coocurrencefile >> $modeldir/$lang1-$lang2.cooc -ml $ml -countincreasecutoff >> $countincreasecutoff -countincreasecutoffal $countincreasecutoffal >> -mincountincrease $mincountincrease -peggedcutoff $peggedcutoff -probcutoff >> $probcutoff -probsmooth $probsmooth -m1 $model1iterations -m2 >> $model2iterations -mh $hmmiterations -m3 $model3iterations -m4 >> $model4iterations -m5 $model5iterations -m6 $model6iterations -t1 >> $model1dumpfrequency -t2 $model2dumpfrequency -t2to3 $transferdumpfrequency >> -t345 $model345dumpfrequency -th $hmmdumpfrequency -onlyaldumps >> $onlyaldumps -nodumps $nodumps -compactadtable $compactadtable >> -model4smoothfactor $model4smoothfactor -compactalignmentformat >> $compactalignmentformat -verbose $verbose -verbosesentence $verbosesentence >> -emalsmooth $emalsmooth -model23smoothfactor $model23smoothfactor >> -model4smoothfactor $model4smoothfactor -model5smoothfactor >> $model5smoothfactor -nsmooth $nsmooth -nsmoothgeneral $nsmoothgeneral -** >> deficientdistortionforemptywor**d $**deficientdistortionforemptywor**d >> -depm4 $depm4 -depm5 $depm5 -emalignmentdependencies >> $emalignmentdependencies -emprobforempty $emprobforempty -m5p0 $m5p0 >> -manlexfactor1 $manlexfactor1 -manlexfactor2 $manlexfactor2 >> -manlexmaxmultiplicity $manlexmaxmultiplicity -maxfertility $maxfertility >> -p0 $p0 -pegging $pegging >> Starting MGIZA >> Initializing Global Paras >> DEBUG: EnterDEBUG: PrefixDEBUG: LogParsing Arguments >> Parameter 'ncpus' changed from '2' to '8' >> Parameter 'c' changed from '' to '/home/Jelita/moses/corpora_** >> trained/model/id-en-**CleanAllCorpus15Oct2012.for_**train.LM-** >> CleanAllCorpus15Oct2012.for_**train-IRSTLM-4-1-improved-** >> kneser-ney-0-1/T-1-1-9-MKCLS-**2-50-MGIZA-8-GIZA-101-5-0-5-3-** >> 3-0-0-1e-06-1e-05-1e-07-0.03-**1e-07-1e-07-0-0-0-0-0-0-0-1-1-** >> 0--10-0.2-0-0.4-0.1-4-0-1-0-**76-68-2-0.4--1-0-0-20-10-0.** >> 999-0-MOSES-6-1-1-60-7-4-1-1-**1-0-0-200-1.0-0-20-0-0-0-1000-** >> 100-20-0-6/id-en-int-train.**snt' >> Parameter 'o' changed from '112-11-06.124815.Jelita' to >> '/home/Jelita/moses/corpora_**trained/model/id-en-** >> CleanAllCorpus15Oct2012.for_**train.LM-**CleanAllCorpus15Oct2012.for_** >> train-IRSTLM-4-1-improved-**kneser-ney-0-1/T-1-1-9-MKCLS-** >> 2-50-MGIZA-8-GIZA-101-5-0-5-3-**3-0-0-1e-06-1e-05-1e-07-0.03-** >> 1e-07-1e-07-0-0-0-0-0-0-0-1-1-**0--10-0.2-0-0.4-0.1-4-0-1-0-** >> 76-68-2-0.4--1-0-0-20-10-0.**999-0-MOSES-6-1-1-60-7-4-1-1-** >> 1-0-0-200-1.0-0-20-0-0-0-1000-**100-20-0-6/id-en' >> Parameter 's' changed from '' to '/home/Jelita/moses/corpora_** >> trained/model/id-en-**CleanAllCorpus15Oct2012.for_**train.LM-** >> CleanAllCorpus15Oct2012.for_**train-IRSTLM-4-1-improved-** >> kneser-ney-0-1/T-1-1-9-MKCLS-**2-50-MGIZA-8-GIZA-101-5-0-5-3-** >> 3-0-0-1e-06-1e-05-1e-07-0.03-**1e-07-1e-07-0-0-0-0-0-0-0-1-1-** >> 0--10-0.2-0-0.4-0.1-4-0-1-0-**76-68-2-0.4--1-0-0-20-10-0.** >> 999-0-MOSES-6-1-1-60-7-4-1-1-**1-0-0-200-1.0-0-20-0-0-0-1000-** >> 100-20-0-6/en.vcb' >> Parameter 't' changed from '' to '/home/Jelita/moses/corpora_** >> trained/model/id-en-**CleanAllCorpus15Oct2012.for_**train.LM-** >> CleanAllCorpus15Oct2012.for_**train-IRSTLM-4-1-improved-** >> kneser-ney-0-1/T-1-1-9-MKCLS-**2-50-MGIZA-8-GIZA-101-5-0-5-3-** >> 3-0-0-1e-06-1e-05-1e-07-0.03-**1e-07-1e-07-0-0-0-0-0-0-0-1-1-** >> 0--10-0.2-0-0.4-0.1-4-0-1-0-**76-68-2-0.4--1-0-0-20-10-0.** >> 999-0-MOSES-6-1-1-60-7-4-1-1-**1-0-0-200-1.0-0-20-0-0-0-1000-** >> 100-20-0-6/id.vcb' >> Parameter 'coocurrencefile' changed from '' to >> '/home/Jelita/moses/corpora_**trained/model/id-en-** >> CleanAllCorpus15Oct2012.for_**train.LM-**CleanAllCorpus15Oct2012.for_** >> train-IRSTLM-4-1-improved-**kneser-ney-0-1/T-1-1-9-MKCLS-** >> 2-50-MGIZA-8-GIZA-101-5-0-5-3-**3-0-0-1e-06-1e-05-1e-07-0.03-** >> 1e-07-1e-07-0-0-0-0-0-0-0-1-1-**0--10-0.2-0-0.4-0.1-4-0-1-0-** >> 76-68-2-0.4--1-0-0-20-10-0.**999-0-MOSES-6-1-1-60-7-4-1-1-** >> 1-0-0-200-1.0-0-20-0-0-0-1000-**100-20-0-6/en-id.cooc' >> Parameter 'm3' changed from '5' to '3' >> Parameter 'm4' changed from '5' to '3' >> Parameter 'onlyaldumps' changed from '0' to '1' >> Parameter 'nodumps' changed from '0' to '1' >> Parameter 'model4smoothfactor' changed from '0.2' to '0.4' >> Parameter 'nsmooth' changed from '64' to '4' >> Parameter 'p0' changed from '-1' to '0.999' >> general parameters: >> ------------------- >> ml = 101 (maximum sentence length) >> >> Here is another extract >> >> ./train-AllCorpusIndo.sh: line 1184: 2756 Aborted (core >> dumped) $toolsdir/mgiza/bin/mgiza -ncpus $mgizanumprocessors -c >> $modeldir/$lang1-$lang2-int-**train.snt -o $modeldir/$lang1-$lang2 -s >> $modeldir/$lang2.vcb -t $modeldir/$lang1.vcb -coocurrencefile >> $modeldir/$lang2-$lang1.cooc -ml $ml -countincreasecutoff >> $countincreasecutoff -countincreasecutoffal $countincreasecutoffal >> -mincountincrease $mincountincrease -peggedcutoff $peggedcutoff -probcutoff >> $probcutoff -probsmooth $probsmooth -m1 $model1iterations -m2 >> $model2iterations -mh $hmmiterations -m3 $model3iterations -m4 >> $model4iterations -m5 $model5iterations -m6 $model6iterations -t1 >> $model1dumpfrequency -t2 $model2dumpfrequency -t2to3 $transferdumpfrequency >> -t345 $model345dumpfrequency -th $hmmdumpfrequency -onlyaldumps >> $onlyaldumps -nodumps $nodumps -compactadtable $compactadtable >> -model4smoothfactor $model4smoothfactor -compactalignmentformat >> $compactalignmentformat -verbose $verbose -verbosesentence $verbosesentence >> -emalsmooth $emalsmooth -model23smoothfactor $model23smoothfactor >> -model4smoothfactor $model4smoothfactor -model5smoothfactor >> $model5smoothfactor -nsmooth $nsmooth -nsmoothgeneral $nsmoothgeneral -** >> deficientdistortionforemptywor**d $**deficientdistortionforemptywor**d >> -depm4 $depm4 -depm5 $depm5 -emalignmentdependencies >> $emalignmentdependencies -emprobforempty $emprobforempty -m5p0 $m5p0 >> -manlexfactor1 $manlexfactor1 -manlexfactor2 $manlexfactor2 >> -manlexmaxmultiplicity $manlexmaxmultiplicity -maxfertility $maxfertility >> -p0 $p0 -pegging $pegging >> ****** phase 2.1 of training (merge alignments) >> Traceback (most recent call last): >> File "/home/Jelita/moses/tools/**mgiza/scripts/merge_alignment.**py", >> line 24, in <module> >> files.append(open(sys.argv[i],**"r")); >> IOError: [Errno 2] No such file or directory: '/home/Jelita/moses/corpora_ >> **trained/model/id-en-**CleanAllCorpus15Oct2012.for_**train.LM-** >> CleanAllCorpus15Oct2012.for_**train-IRSTLM-4-1-improved-** >> kneser-ney-0-1/T-1-1-9-MKCLS-**2-50-MGIZA-8-GIZA-101-5-0-5-3-** >> 3-0-0-1e-06-1e-05-1e-07-0.03-**1e-07-1e-07-0-0-0-0-0-0-0-1-1-** >> 0--10-0.2-0-0.4-0.1-4-0-1-0-**76-68-2-0.4--1-0-0-20-10-0.** >> 999-0-MOSES-6-1-1-60-7-4-1-1-**1-0-0-200-1.0-0-20-0-0-0-1000-** >> 100-20-0-6/id-en.A3.final.**part*' >> Traceback (most recent call last): >> File "/home/Jelita/moses/tools/**mgiza/scripts/merge_alignment.**py", >> line 24, in <module> >> files.append(open(sys.argv[i],**"r")); >> IOError: [Errno 2] No such file or directory: '/home/Jelita/moses/corpora_ >> **trained/model/id-en-**CleanAllCorpus15Oct2012.for_**train.LM-** >> CleanAllCorpus15Oct2012.for_**train-IRSTLM-4-1-improved-** >> kneser-ney-0-1/T-1-1-9-MKCLS-**2-50-MGIZA-8-GIZA-101-5-0-5-3-** >> 3-0-0-1e-06-1e-05-1e-07-0.03-**1e-07-1e-07-0-0-0-0-0-0-0-1-1-** >> 0--10-0.2-0-0.4-0.1-4-0-1-0-**76-68-2-0.4--1-0-0-20-10-0.** >> 999-0-MOSES-6-1-1-60-7-4-1-1-**1-0-0-200-1.0-0-20-0-0-0-1000-** >> 100-20-0-6/en-id.A3.final.**part*' >> ****** Rest of parallel training >> Using SCRIPTS_ROOTDIR: /home/Jelita/moses/tools/**moses/scripts >> Using single-thread GIZA >> (3) generate word alignment @ Tue Nov 6 13:07:31 SEAST 2012 >> Combining forward and inverted alignment from files: >> /home/Jelita/moses/corpora_**trained/model/id-en-** >> CleanAllCorpus15Oct2012.for_**train.LM-**CleanAllCorpus15Oct2012.for_** >> train-IRSTLM-4-1-improved-**kneser-ney-0-1/T-1-1-9-MKCLS-** >> 2-50-MGIZA-8-GIZA-101-5-0-5-3-**3-0-0-1e-06-1e-05-1e-07-0.03-** >> 1e-07-1e-07-0-0-0-0-0-0-0-1-1-**0--10-0.2-0-0.4-0.1-4-0-1-0-** >> 76-68-2-0.4--1-0-0-20-10-0.**999-0-MOSES-6-1-1-60-7-4-1-1-** >> 1-0-0-200-1.0-0-20-0-0-0-1000-**100-20-0-6/id-en.A3.final.{**bz2,gz} >> /home/Jelita/moses/corpora_**trained/model/id-en-** >> CleanAllCorpus15Oct2012.for_**train.LM-**CleanAllCorpus15Oct2012.for_** >> train-IRSTLM-4-1-improved-**kneser-ney-0-1/T-1-1-9-MKCLS-** >> 2-50-MGIZA-8-GIZA-101-5-0-5-3-**3-0-0-1e-06-1e-05-1e-07-0.03-** >> 1e-07-1e-07-0-0-0-0-0-0-0-1-1-**0--10-0.2-0-0.4-0.1-4-0-1-0-** >> 76-68-2-0.4--1-0-0-20-10-0.**999-0-MOSES-6-1-1-60-7-4-1-1-** >> 1-0-0-200-1.0-0-20-0-0-0-1000-**100-20-0-6/en-id.A3.final.{**bz2,gz} >> Executing: mkdir -p /home/Jelita/moses/corpora_**trained/model/id-en-** >> CleanAllCorpus15Oct2012.for_**train.LM-**CleanAllCorpus15Oct2012.for_** >> train-IRSTLM-4-1-improved-**kneser-ney-0-1/T-1-1-9-MKCLS-** >> 2-50-MGIZA-8-GIZA-101-5-0-5-3-**3-0-0-1e-06-1e-05-1e-07-0.03-** >> 1e-07-1e-07-0-0-0-0-0-0-0-1-1-**0--10-0.2-0-0.4-0.1-4-0-1-0-** >> 76-68-2-0.4--1-0-0-20-10-0.**999-0-MOSES-6-1-1-60-7-4-1-1-** >> 1-0-0-200-1.0-0-20-0-0-0-1000-**100-20-0-6 >> Executing: /home/Jelita/moses/tools/**moses/scripts/training/symal/g** >> iza2bal.pl <http://giza2bal.pl> <http://giza2bal.pl> -d "gzip -cd >> /home/Jelita/moses/corpora_**trained/model/id-en-** >> CleanAllCorpus15Oct2012.for_**train.LM-**CleanAllCorpus15Oct2012.for_** >> train-IRSTLM-4-1-improved-**kneser-ney-0-1/T-1-1-9-MKCLS-** >> 2-50-MGIZA-8-GIZA-101-5-0-5-3-**3-0-0-1e-06-1e-05-1e-07-0.03-** >> 1e-07-1e-07-0-0-0-0-0-0-0-1-1-**0--10-0.2-0-0.4-0.1-4-0-1-0-** >> 76-68-2-0.4--1-0-0-20-10-0.**999-0-MOSES-6-1-1-60-7-4-1-1-** >> 1-0-0-200-1.0-0-20-0-0-0-1000-**100-20-0-6/en-id.A3.final.gz" -i "gzip >> -cd /home/Jelita/moses/corpora_**trained/model/id-en-** >> CleanAllCorpus15Oct2012.for_**train.LM-**CleanAllCorpus15Oct2012.for_** >> train-IRSTLM-4-1-improved-**kneser-ney-0-1/T-1-1-9-MKCLS-** >> 2-50-MGIZA-8-GIZA-101-5-0-5-3-**3-0-0-1e-06-1e-05-1e-07-0.03-** >> 1e-07-1e-07-0-0-0-0-0-0-0-1-1-**0--10-0.2-0-0.4-0.1-4-0-1-0-** >> 76-68-2-0.4--1-0-0-20-10-0.**999-0-MOSES-6-1-1-60-7-4-1-1-** >> 1-0-0-200-1.0-0-20-0-0-0-1000-**100-20-0-6/id-en.A3.final.gz" >> |/home/Jelita/moses/tools/**moses/scripts/training/symal/**symal >> -alignment="grow" -diagonal="yes" -final="yes" -both="yes" > >> /home/Jelita/moses/corpora_**trained/model/id-en-** >> CleanAllCorpus15Oct2012.for_**train.LM-**CleanAllCorpus15Oct2012.for_** >> train-IRSTLM-4-1-improved-**kneser-ney-0-1/T-1-1-9-MKCLS-** >> 2-50-MGIZA-8-GIZA-101-5-0-5-3-**3-0-0-1e-06-1e-05-1e-07-0.03-** >> 1e-07-1e-07-0-0-0-0-0-0-0-1-1-**0--10-0.2-0-0.4-0.1-4-0-1-0-** >> 76-68-2-0.4--1-0-0-20-10-0.**999-0-MOSES-6-1-1-60-7-4-1-1-** >> 1-0-0-200-1.0-0-20-0-0-0-1000-**100-20-0-6/aligned.grow-diag-**final-and >> >> symal: computing grow alignment: diagonal (1) final (1)both-uncovered (1) >> skip=<0> counts=<0> >> (4) generate lexical translation table 0-0 @ Tue Nov 6 13:07:31 SEAST >> 2012 >> (/home/Jelita/moses/corpora_**trained/lc_clean/MinLen-1.**MaxLen-60/ >> CleanAllCorpus15Oct2**012.for_train.lowercase.id<http://CleanAllCorpus15Oct2012.for_train.lowercase.id>< >> http://**CleanAllCorpus15Oct2012.for_**train.lowercase.id<http://CleanAllCorpus15Oct2012.for_train.lowercase.id> >> >,/home/**Jelita/moses/corpora_trained/**lc_clean/MinLen-1.MaxLen-60/** >> CleanAllCorpus15Oct2012.for_**train.lowercase.en,/home/** >> Jelita/moses/corpora_trained/**model/id-en-**CleanAllCorpus15Oct2012.for_ >> **train.LM-**CleanAllCorpus15Oct2012.for_**train-IRSTLM-4-1-improved-** >> kneser-ney-0-1/T-1-1-9-MKCLS-**2-50-MGIZA-8-GIZA-101-5-0-5-3-** >> 3-0-0-1e-06-1e-05-1e-07-0.03-**1e-07-1e-07-0-0-0-0-0-0-0-1-1-** >> 0--10-0.2-0-0.4-0.1-4-0-1-0-**76-68-2-0.4--1-0-0-20-10-0.** >> 999-0-MOSES-6-1-1-60-7-4-1-1-**1-0-0-200-1.0-0-20-0-0-0-1000-** >> 100-20-0-6/lex) >> >> !Use of uninitialized value $a in scalar chomp at >> /home/Jelita/moses/tools/**moses/scripts/training/train-**model.perl >> line 1079. >> Use of uninitialized value $a in split at /home/Jelita/moses/tools/** >> moses/scripts/training/train-**model.perl line 1082. >> >> What is the cause? I use cygwin for Windows 7 on a 64-bit machine. >> I ran a few times and it can't get pass the Model 1 training. >> >> Thanks. >> >> Best regards, >> >> Jelita >> >> >> >> ______________________________**_________________ >> Moses-support mailing list >> [email protected] >> http://mailman.mit.edu/**mailman/listinfo/moses-support<http://mailman.mit.edu/mailman/listinfo/moses-support> >> > > > -- > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > >
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
