Hi Barry, Actually how do we solve the more than 2 GB memory problem? Thanks.
Best regards, Jelita On Fri, Nov 9, 2012 at 10:51 AM, Jelita Asian <[email protected]>wrote: > Hi Barry, > > Thanks. I will look into it now. > > Cheers, > > Jelita > > > On Thu, Nov 8, 2012 at 10:09 PM, Barry Haddow > <[email protected]>wrote: > >> Hi Jelita >> >> It could be running out of memory. Under cygwin, mgiza will be limited to >> 2GB >> http://www.statmt.org/moses/?**n=Moses.FAQ#ntoc9<http://www.statmt.org/moses/?n=Moses.FAQ#ntoc9> >> >> cheers - Barry >> >> >> On 06/11/12 07:33, Jelita Asian wrote: >> >>> Hi, >>> >>> I run Moses training using moses-for-mere-mortal scripts. The run is >>> used to be OK. However, since I increase the number of words (mostly >>> numbers written in words where words act as parallel sentences in corpus >>> for Indonesian and English), I keep getting mgiza stack-dump, hence the >>> training is failed. >>> >>> Here is the extract for the log file of the run: >>> >>> ----------- >>> Model1: Iteration 5 >>> Reading more sentence pairs into memory ... >>> [sent:100000] >>> Reading more sentence pairs into memory ... >>> Reading more sentence pairs into memory ... >>> Reading more sentence pairs into memory ... >>> Model1: (5) TRAIN CROSS-ENTROPY 5.82453 PERPLEXITY 56.6706 >>> Model1: (5) VITERBI TRAIN CROSS-ENTROPY 6.59753 PERPLEXITY 96.8401 >>> Model 1 Iteration: 5 took: 87 seconds >>> Entire Model1 Training took: 444 seconds >>> NOTE: I am doing iterations with the HMM model! >>> Read classes: #words: 48562 #classes: 51 >>> Actual number of read words: 48561 stored words: 48561 >>> Read classes: #words: 45484 #classes: 51 >>> Actual number of read words: 45483 stored words: 45483 >>> >>> ==============================**============================ >>> Hmm Training Started at: Tue Nov 6 12:46:41 2012 >>> >>> ./train-AllCorpusIndo.sh: line 1184: 3936 Aborted (core >>> dumped) $toolsdir/mgiza/bin/mgiza -ncpus $mgizanumprocessors -c >>> $modeldir/$lang2-$lang1-int-**train.snt -o $modeldir/$lang2-$lang1 -s >>> $modeldir/$lang1.vcb -t $modeldir/$lang2.vcb -coocurrencefile >>> $modeldir/$lang1-$lang2.cooc -ml $ml -countincreasecutoff >>> $countincreasecutoff -countincreasecutoffal $countincreasecutoffal >>> -mincountincrease $mincountincrease -peggedcutoff $peggedcutoff -probcutoff >>> $probcutoff -probsmooth $probsmooth -m1 $model1iterations -m2 >>> $model2iterations -mh $hmmiterations -m3 $model3iterations -m4 >>> $model4iterations -m5 $model5iterations -m6 $model6iterations -t1 >>> $model1dumpfrequency -t2 $model2dumpfrequency -t2to3 $transferdumpfrequency >>> -t345 $model345dumpfrequency -th $hmmdumpfrequency -onlyaldumps >>> $onlyaldumps -nodumps $nodumps -compactadtable $compactadtable >>> -model4smoothfactor $model4smoothfactor -compactalignmentformat >>> $compactalignmentformat -verbose $verbose -verbosesentence $verbosesentence >>> -emalsmooth $emalsmooth -model23smoothfactor $model23smoothfactor >>> -model4smoothfactor $model4smoothfactor -model5smoothfactor >>> $model5smoothfactor -nsmooth $nsmooth -nsmoothgeneral $nsmoothgeneral -* >>> *deficientdistortionforemptywor**d $**deficientdistortionforemptywor**d >>> -depm4 $depm4 -depm5 $depm5 -emalignmentdependencies >>> $emalignmentdependencies -emprobforempty $emprobforempty -m5p0 $m5p0 >>> -manlexfactor1 $manlexfactor1 -manlexfactor2 $manlexfactor2 >>> -manlexmaxmultiplicity $manlexmaxmultiplicity -maxfertility $maxfertility >>> -p0 $p0 -pegging $pegging >>> Starting MGIZA >>> Initializing Global Paras >>> DEBUG: EnterDEBUG: PrefixDEBUG: LogParsing Arguments >>> Parameter 'ncpus' changed from '2' to '8' >>> Parameter 'c' changed from '' to '/home/Jelita/moses/corpora_** >>> trained/model/id-en-**CleanAllCorpus15Oct2012.for_**train.LM-** >>> CleanAllCorpus15Oct2012.for_**train-IRSTLM-4-1-improved-** >>> kneser-ney-0-1/T-1-1-9-MKCLS-**2-50-MGIZA-8-GIZA-101-5-0-5-3-** >>> 3-0-0-1e-06-1e-05-1e-07-0.03-**1e-07-1e-07-0-0-0-0-0-0-0-1-1-** >>> 0--10-0.2-0-0.4-0.1-4-0-1-0-**76-68-2-0.4--1-0-0-20-10-0.** >>> 999-0-MOSES-6-1-1-60-7-4-1-1-**1-0-0-200-1.0-0-20-0-0-0-1000-** >>> 100-20-0-6/id-en-int-train.**snt' >>> Parameter 'o' changed from '112-11-06.124815.Jelita' to >>> '/home/Jelita/moses/corpora_**trained/model/id-en-** >>> CleanAllCorpus15Oct2012.for_**train.LM-**CleanAllCorpus15Oct2012.for_** >>> train-IRSTLM-4-1-improved-**kneser-ney-0-1/T-1-1-9-MKCLS-** >>> 2-50-MGIZA-8-GIZA-101-5-0-5-3-**3-0-0-1e-06-1e-05-1e-07-0.03-** >>> 1e-07-1e-07-0-0-0-0-0-0-0-1-1-**0--10-0.2-0-0.4-0.1-4-0-1-0-** >>> 76-68-2-0.4--1-0-0-20-10-0.**999-0-MOSES-6-1-1-60-7-4-1-1-** >>> 1-0-0-200-1.0-0-20-0-0-0-1000-**100-20-0-6/id-en' >>> Parameter 's' changed from '' to '/home/Jelita/moses/corpora_** >>> trained/model/id-en-**CleanAllCorpus15Oct2012.for_**train.LM-** >>> CleanAllCorpus15Oct2012.for_**train-IRSTLM-4-1-improved-** >>> kneser-ney-0-1/T-1-1-9-MKCLS-**2-50-MGIZA-8-GIZA-101-5-0-5-3-** >>> 3-0-0-1e-06-1e-05-1e-07-0.03-**1e-07-1e-07-0-0-0-0-0-0-0-1-1-** >>> 0--10-0.2-0-0.4-0.1-4-0-1-0-**76-68-2-0.4--1-0-0-20-10-0.** >>> 999-0-MOSES-6-1-1-60-7-4-1-1-**1-0-0-200-1.0-0-20-0-0-0-1000-** >>> 100-20-0-6/en.vcb' >>> Parameter 't' changed from '' to '/home/Jelita/moses/corpora_** >>> trained/model/id-en-**CleanAllCorpus15Oct2012.for_**train.LM-** >>> CleanAllCorpus15Oct2012.for_**train-IRSTLM-4-1-improved-** >>> kneser-ney-0-1/T-1-1-9-MKCLS-**2-50-MGIZA-8-GIZA-101-5-0-5-3-** >>> 3-0-0-1e-06-1e-05-1e-07-0.03-**1e-07-1e-07-0-0-0-0-0-0-0-1-1-** >>> 0--10-0.2-0-0.4-0.1-4-0-1-0-**76-68-2-0.4--1-0-0-20-10-0.** >>> 999-0-MOSES-6-1-1-60-7-4-1-1-**1-0-0-200-1.0-0-20-0-0-0-1000-** >>> 100-20-0-6/id.vcb' >>> Parameter 'coocurrencefile' changed from '' to >>> '/home/Jelita/moses/corpora_**trained/model/id-en-** >>> CleanAllCorpus15Oct2012.for_**train.LM-**CleanAllCorpus15Oct2012.for_** >>> train-IRSTLM-4-1-improved-**kneser-ney-0-1/T-1-1-9-MKCLS-** >>> 2-50-MGIZA-8-GIZA-101-5-0-5-3-**3-0-0-1e-06-1e-05-1e-07-0.03-** >>> 1e-07-1e-07-0-0-0-0-0-0-0-1-1-**0--10-0.2-0-0.4-0.1-4-0-1-0-** >>> 76-68-2-0.4--1-0-0-20-10-0.**999-0-MOSES-6-1-1-60-7-4-1-1-** >>> 1-0-0-200-1.0-0-20-0-0-0-1000-**100-20-0-6/en-id.cooc' >>> Parameter 'm3' changed from '5' to '3' >>> Parameter 'm4' changed from '5' to '3' >>> Parameter 'onlyaldumps' changed from '0' to '1' >>> Parameter 'nodumps' changed from '0' to '1' >>> Parameter 'model4smoothfactor' changed from '0.2' to '0.4' >>> Parameter 'nsmooth' changed from '64' to '4' >>> Parameter 'p0' changed from '-1' to '0.999' >>> general parameters: >>> ------------------- >>> ml = 101 (maximum sentence length) >>> >>> Here is another extract >>> >>> ./train-AllCorpusIndo.sh: line 1184: 2756 Aborted (core >>> dumped) $toolsdir/mgiza/bin/mgiza -ncpus $mgizanumprocessors -c >>> $modeldir/$lang1-$lang2-int-**train.snt -o $modeldir/$lang1-$lang2 -s >>> $modeldir/$lang2.vcb -t $modeldir/$lang1.vcb -coocurrencefile >>> $modeldir/$lang2-$lang1.cooc -ml $ml -countincreasecutoff >>> $countincreasecutoff -countincreasecutoffal $countincreasecutoffal >>> -mincountincrease $mincountincrease -peggedcutoff $peggedcutoff -probcutoff >>> $probcutoff -probsmooth $probsmooth -m1 $model1iterations -m2 >>> $model2iterations -mh $hmmiterations -m3 $model3iterations -m4 >>> $model4iterations -m5 $model5iterations -m6 $model6iterations -t1 >>> $model1dumpfrequency -t2 $model2dumpfrequency -t2to3 $transferdumpfrequency >>> -t345 $model345dumpfrequency -th $hmmdumpfrequency -onlyaldumps >>> $onlyaldumps -nodumps $nodumps -compactadtable $compactadtable >>> -model4smoothfactor $model4smoothfactor -compactalignmentformat >>> $compactalignmentformat -verbose $verbose -verbosesentence $verbosesentence >>> -emalsmooth $emalsmooth -model23smoothfactor $model23smoothfactor >>> -model4smoothfactor $model4smoothfactor -model5smoothfactor >>> $model5smoothfactor -nsmooth $nsmooth -nsmoothgeneral $nsmoothgeneral -* >>> *deficientdistortionforemptywor**d $**deficientdistortionforemptywor**d >>> -depm4 $depm4 -depm5 $depm5 -emalignmentdependencies >>> $emalignmentdependencies -emprobforempty $emprobforempty -m5p0 $m5p0 >>> -manlexfactor1 $manlexfactor1 -manlexfactor2 $manlexfactor2 >>> -manlexmaxmultiplicity $manlexmaxmultiplicity -maxfertility $maxfertility >>> -p0 $p0 -pegging $pegging >>> ****** phase 2.1 of training (merge alignments) >>> Traceback (most recent call last): >>> File "/home/Jelita/moses/tools/**mgiza/scripts/merge_alignment.**py", >>> line 24, in <module> >>> files.append(open(sys.argv[i],**"r")); >>> IOError: [Errno 2] No such file or directory: >>> '/home/Jelita/moses/corpora_**trained/model/id-en-** >>> CleanAllCorpus15Oct2012.for_**train.LM-**CleanAllCorpus15Oct2012.for_** >>> train-IRSTLM-4-1-improved-**kneser-ney-0-1/T-1-1-9-MKCLS-** >>> 2-50-MGIZA-8-GIZA-101-5-0-5-3-**3-0-0-1e-06-1e-05-1e-07-0.03-** >>> 1e-07-1e-07-0-0-0-0-0-0-0-1-1-**0--10-0.2-0-0.4-0.1-4-0-1-0-** >>> 76-68-2-0.4--1-0-0-20-10-0.**999-0-MOSES-6-1-1-60-7-4-1-1-** >>> 1-0-0-200-1.0-0-20-0-0-0-1000-**100-20-0-6/id-en.A3.final.**part*' >>> Traceback (most recent call last): >>> File "/home/Jelita/moses/tools/**mgiza/scripts/merge_alignment.**py", >>> line 24, in <module> >>> files.append(open(sys.argv[i],**"r")); >>> IOError: [Errno 2] No such file or directory: >>> '/home/Jelita/moses/corpora_**trained/model/id-en-** >>> CleanAllCorpus15Oct2012.for_**train.LM-**CleanAllCorpus15Oct2012.for_** >>> train-IRSTLM-4-1-improved-**kneser-ney-0-1/T-1-1-9-MKCLS-** >>> 2-50-MGIZA-8-GIZA-101-5-0-5-3-**3-0-0-1e-06-1e-05-1e-07-0.03-** >>> 1e-07-1e-07-0-0-0-0-0-0-0-1-1-**0--10-0.2-0-0.4-0.1-4-0-1-0-** >>> 76-68-2-0.4--1-0-0-20-10-0.**999-0-MOSES-6-1-1-60-7-4-1-1-** >>> 1-0-0-200-1.0-0-20-0-0-0-1000-**100-20-0-6/en-id.A3.final.**part*' >>> ****** Rest of parallel training >>> Using SCRIPTS_ROOTDIR: /home/Jelita/moses/tools/**moses/scripts >>> Using single-thread GIZA >>> (3) generate word alignment @ Tue Nov 6 13:07:31 SEAST 2012 >>> Combining forward and inverted alignment from files: >>> /home/Jelita/moses/corpora_**trained/model/id-en-** >>> CleanAllCorpus15Oct2012.for_**train.LM-**CleanAllCorpus15Oct2012.for_** >>> train-IRSTLM-4-1-improved-**kneser-ney-0-1/T-1-1-9-MKCLS-** >>> 2-50-MGIZA-8-GIZA-101-5-0-5-3-**3-0-0-1e-06-1e-05-1e-07-0.03-** >>> 1e-07-1e-07-0-0-0-0-0-0-0-1-1-**0--10-0.2-0-0.4-0.1-4-0-1-0-** >>> 76-68-2-0.4--1-0-0-20-10-0.**999-0-MOSES-6-1-1-60-7-4-1-1-** >>> 1-0-0-200-1.0-0-20-0-0-0-1000-**100-20-0-6/id-en.A3.final.{**bz2,gz} >>> /home/Jelita/moses/corpora_**trained/model/id-en-** >>> CleanAllCorpus15Oct2012.for_**train.LM-**CleanAllCorpus15Oct2012.for_** >>> train-IRSTLM-4-1-improved-**kneser-ney-0-1/T-1-1-9-MKCLS-** >>> 2-50-MGIZA-8-GIZA-101-5-0-5-3-**3-0-0-1e-06-1e-05-1e-07-0.03-** >>> 1e-07-1e-07-0-0-0-0-0-0-0-1-1-**0--10-0.2-0-0.4-0.1-4-0-1-0-** >>> 76-68-2-0.4--1-0-0-20-10-0.**999-0-MOSES-6-1-1-60-7-4-1-1-** >>> 1-0-0-200-1.0-0-20-0-0-0-1000-**100-20-0-6/en-id.A3.final.{**bz2,gz} >>> Executing: mkdir -p /home/Jelita/moses/corpora_**trained/model/id-en-** >>> CleanAllCorpus15Oct2012.for_**train.LM-**CleanAllCorpus15Oct2012.for_** >>> train-IRSTLM-4-1-improved-**kneser-ney-0-1/T-1-1-9-MKCLS-** >>> 2-50-MGIZA-8-GIZA-101-5-0-5-3-**3-0-0-1e-06-1e-05-1e-07-0.03-** >>> 1e-07-1e-07-0-0-0-0-0-0-0-1-1-**0--10-0.2-0-0.4-0.1-4-0-1-0-** >>> 76-68-2-0.4--1-0-0-20-10-0.**999-0-MOSES-6-1-1-60-7-4-1-1-** >>> 1-0-0-200-1.0-0-20-0-0-0-1000-**100-20-0-6 >>> Executing: /home/Jelita/moses/tools/**moses/scripts/training/symal/g** >>> iza2bal.pl <http://giza2bal.pl> <http://giza2bal.pl> -d "gzip -cd >>> /home/Jelita/moses/corpora_**trained/model/id-en-** >>> CleanAllCorpus15Oct2012.for_**train.LM-**CleanAllCorpus15Oct2012.for_** >>> train-IRSTLM-4-1-improved-**kneser-ney-0-1/T-1-1-9-MKCLS-** >>> 2-50-MGIZA-8-GIZA-101-5-0-5-3-**3-0-0-1e-06-1e-05-1e-07-0.03-** >>> 1e-07-1e-07-0-0-0-0-0-0-0-1-1-**0--10-0.2-0-0.4-0.1-4-0-1-0-** >>> 76-68-2-0.4--1-0-0-20-10-0.**999-0-MOSES-6-1-1-60-7-4-1-1-** >>> 1-0-0-200-1.0-0-20-0-0-0-1000-**100-20-0-6/en-id.A3.final.gz" -i "gzip >>> -cd /home/Jelita/moses/corpora_**trained/model/id-en-** >>> CleanAllCorpus15Oct2012.for_**train.LM-**CleanAllCorpus15Oct2012.for_** >>> train-IRSTLM-4-1-improved-**kneser-ney-0-1/T-1-1-9-MKCLS-** >>> 2-50-MGIZA-8-GIZA-101-5-0-5-3-**3-0-0-1e-06-1e-05-1e-07-0.03-** >>> 1e-07-1e-07-0-0-0-0-0-0-0-1-1-**0--10-0.2-0-0.4-0.1-4-0-1-0-** >>> 76-68-2-0.4--1-0-0-20-10-0.**999-0-MOSES-6-1-1-60-7-4-1-1-** >>> 1-0-0-200-1.0-0-20-0-0-0-1000-**100-20-0-6/id-en.A3.final.gz" >>> |/home/Jelita/moses/tools/**moses/scripts/training/symal/**symal >>> -alignment="grow" -diagonal="yes" -final="yes" -both="yes" > >>> /home/Jelita/moses/corpora_**trained/model/id-en-** >>> CleanAllCorpus15Oct2012.for_**train.LM-**CleanAllCorpus15Oct2012.for_** >>> train-IRSTLM-4-1-improved-**kneser-ney-0-1/T-1-1-9-MKCLS-** >>> 2-50-MGIZA-8-GIZA-101-5-0-5-3-**3-0-0-1e-06-1e-05-1e-07-0.03-** >>> 1e-07-1e-07-0-0-0-0-0-0-0-1-1-**0--10-0.2-0-0.4-0.1-4-0-1-0-** >>> 76-68-2-0.4--1-0-0-20-10-0.**999-0-MOSES-6-1-1-60-7-4-1-1-** >>> 1-0-0-200-1.0-0-20-0-0-0-1000-**100-20-0-6/aligned.grow-diag-**final-and >>> >>> symal: computing grow alignment: diagonal (1) final (1)both-uncovered (1) >>> skip=<0> counts=<0> >>> (4) generate lexical translation table 0-0 @ Tue Nov 6 13:07:31 SEAST >>> 2012 >>> (/home/Jelita/moses/corpora_**trained/lc_clean/MinLen-1.**MaxLen-60/ >>> CleanAllCorpus15Oct2**012.for_train.lowercase.id<http://CleanAllCorpus15Oct2012.for_train.lowercase.id>< >>> http://**CleanAllCorpus15Oct2012.for_**train.lowercase.id<http://CleanAllCorpus15Oct2012.for_train.lowercase.id> >>> >,/home/**Jelita/moses/corpora_trained/**lc_clean/MinLen-1.MaxLen-60/** >>> CleanAllCorpus15Oct2012.for_**train.lowercase.en,/home/** >>> Jelita/moses/corpora_trained/**model/id-en-** >>> CleanAllCorpus15Oct2012.for_**train.LM-**CleanAllCorpus15Oct2012.for_** >>> train-IRSTLM-4-1-improved-**kneser-ney-0-1/T-1-1-9-MKCLS-** >>> 2-50-MGIZA-8-GIZA-101-5-0-5-3-**3-0-0-1e-06-1e-05-1e-07-0.03-** >>> 1e-07-1e-07-0-0-0-0-0-0-0-1-1-**0--10-0.2-0-0.4-0.1-4-0-1-0-** >>> 76-68-2-0.4--1-0-0-20-10-0.**999-0-MOSES-6-1-1-60-7-4-1-1-** >>> 1-0-0-200-1.0-0-20-0-0-0-1000-**100-20-0-6/lex) >>> >>> !Use of uninitialized value $a in scalar chomp at >>> /home/Jelita/moses/tools/**moses/scripts/training/train-**model.perl >>> line 1079. >>> Use of uninitialized value $a in split at /home/Jelita/moses/tools/** >>> moses/scripts/training/train-**model.perl line 1082. >>> >>> What is the cause? I use cygwin for Windows 7 on a 64-bit machine. >>> I ran a few times and it can't get pass the Model 1 training. >>> >>> Thanks. >>> >>> Best regards, >>> >>> Jelita >>> >>> >>> >>> ______________________________**_________________ >>> Moses-support mailing list >>> [email protected] >>> http://mailman.mit.edu/**mailman/listinfo/moses-support<http://mailman.mit.edu/mailman/listinfo/moses-support> >>> >> >> >> -- >> The University of Edinburgh is a charitable body, registered in >> Scotland, with registration number SC005336. >> >> >
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
