Thanks for everyone who answer! I guess the solution is either to break up the corpus or run it using mingw-w64. Thanks.
Best regards, Jelita On Tue, Nov 13, 2012 at 6:08 AM, Hieu Hoang <[email protected]> wrote: > on cygwin, there is no way to solve the problem. 2GB is the maximum any > process can use on cygwin. > > If you have large model files, you should move to 64-bit linux or macOSX, > with plenty of memory. > > Or try and compile mgiza and moses without cygwin. For example, use mingw > or visual studio. > > This will require some work. Other people may be trying to do the same > thing, so maybe team up with them. > > For mgiza, you can minimize memory by following this > http://www.statmt.org/moses/?n=Moses.Optimize#ntoc8 > However, you may encounter more memory problems further down the pipeline > anyway. > > I would personally advised against using berkeley aligner. IMO, it's buggy > and questions to their developers go unanswered. > > > On 12/11/2012 12:56, Jelita Asian wrote: > > Hi Barry, > > Actually how do we solve the more than 2 GB memory problem? Thanks. > > Best regards, > > Jelita > > On Fri, Nov 9, 2012 at 10:51 AM, Jelita Asian <[email protected] > > wrote: > >> Hi Barry, >> >> Thanks. I will look into it now. >> >> Cheers, >> >> Jelita >> >> >> On Thu, Nov 8, 2012 at 10:09 PM, Barry Haddow <[email protected] >> > wrote: >> >>> Hi Jelita >>> >>> It could be running out of memory. Under cygwin, mgiza will be limited >>> to 2GB >>> http://www.statmt.org/moses/?n=Moses.FAQ#ntoc9 >>> >>> cheers - Barry >>> >>> >>> On 06/11/12 07:33, Jelita Asian wrote: >>> >>>> Hi, >>>> >>>> I run Moses training using moses-for-mere-mortal scripts. The run is >>>> used to be OK. However, since I increase the number of words (mostly >>>> numbers written in words where words act as parallel sentences in corpus >>>> for Indonesian and English), I keep getting mgiza stack-dump, hence the >>>> training is failed. >>>> >>>> Here is the extract for the log file of the run: >>>> >>>> ----------- >>>> Model1: Iteration 5 >>>> Reading more sentence pairs into memory ... >>>> [sent:100000] >>>> Reading more sentence pairs into memory ... >>>> Reading more sentence pairs into memory ... >>>> Reading more sentence pairs into memory ... >>>> Model1: (5) TRAIN CROSS-ENTROPY 5.82453 PERPLEXITY 56.6706 >>>> Model1: (5) VITERBI TRAIN CROSS-ENTROPY 6.59753 PERPLEXITY 96.8401 >>>> Model 1 Iteration: 5 took: 87 seconds >>>> Entire Model1 Training took: 444 seconds >>>> NOTE: I am doing iterations with the HMM model! >>>> Read classes: #words: 48562 #classes: 51 >>>> Actual number of read words: 48561 stored words: 48561 >>>> Read classes: #words: 45484 #classes: 51 >>>> Actual number of read words: 45483 stored words: 45483 >>>> >>>> ========================================================== >>>> Hmm Training Started at: Tue Nov 6 12:46:41 2012 >>>> >>>> ./train-AllCorpusIndo.sh: line 1184: 3936 Aborted >>>> (core dumped) $toolsdir/mgiza/bin/mgiza -ncpus $mgizanumprocessors -c >>>> $modeldir/$lang2-$lang1-int-train.snt -o $modeldir/$lang2-$lang1 -s >>>> $modeldir/$lang1.vcb -t $modeldir/$lang2.vcb -coocurrencefile >>>> $modeldir/$lang1-$lang2.cooc -ml $ml -countincreasecutoff >>>> $countincreasecutoff -countincreasecutoffal $countincreasecutoffal >>>> -mincountincrease $mincountincrease -peggedcutoff $peggedcutoff -probcutoff >>>> $probcutoff -probsmooth $probsmooth -m1 $model1iterations -m2 >>>> $model2iterations -mh $hmmiterations -m3 $model3iterations -m4 >>>> $model4iterations -m5 $model5iterations -m6 $model6iterations -t1 >>>> $model1dumpfrequency -t2 $model2dumpfrequency -t2to3 $transferdumpfrequency >>>> -t345 $model345dumpfrequency -th $hmmdumpfrequency -onlyaldumps >>>> $onlyaldumps -nodumps $nodumps -compactadtable $compactadtable >>>> -model4smoothfactor $model4smoothfactor -compactalignmentformat >>>> $compactalignmentformat -verbose $verbose -verbosesentence $verbosesentence >>>> -emalsmooth $emalsmooth -model23smoothfactor $model23smoothfactor >>>> -model4smoothfactor $model4smoothfactor -model5smoothfactor >>>> $model5smoothfactor -nsmooth $nsmooth -nsmoothgeneral $nsmoothgeneral >>>> -deficientdistortionforemptyword $deficientdistortionforemptyword -depm4 >>>> $depm4 -depm5 $depm5 -emalignmentdependencies $emalignmentdependencies >>>> -emprobforempty $emprobforempty -m5p0 $m5p0 -manlexfactor1 $manlexfactor1 >>>> -manlexfactor2 $manlexfactor2 -manlexmaxmultiplicity $manlexmaxmultiplicity >>>> -maxfertility $maxfertility -p0 $p0 -pegging $pegging >>>> Starting MGIZA >>>> Initializing Global Paras >>>> DEBUG: EnterDEBUG: PrefixDEBUG: LogParsing Arguments >>>> Parameter 'ncpus' changed from '2' to '8' >>>> Parameter 'c' changed from '' to >>>> '/home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/id-en-int-train.snt' >>>> Parameter 'o' changed from '112-11-06.124815.Jelita' to >>>> '/home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/id-en' >>>> Parameter 's' changed from '' to >>>> '/home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/en.vcb' >>>> Parameter 't' changed from '' to >>>> '/home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/id.vcb' >>>> Parameter 'coocurrencefile' changed from '' to >>>> '/home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/en-id.cooc' >>>> Parameter 'm3' changed from '5' to '3' >>>> Parameter 'm4' changed from '5' to '3' >>>> Parameter 'onlyaldumps' changed from '0' to '1' >>>> Parameter 'nodumps' changed from '0' to '1' >>>> Parameter 'model4smoothfactor' changed from '0.2' to '0.4' >>>> Parameter 'nsmooth' changed from '64' to '4' >>>> Parameter 'p0' changed from '-1' to '0.999' >>>> general parameters: >>>> ------------------- >>>> ml = 101 (maximum sentence length) >>>> >>>> Here is another extract >>>> >>>> ./train-AllCorpusIndo.sh: line 1184: 2756 Aborted >>>> (core dumped) $toolsdir/mgiza/bin/mgiza -ncpus $mgizanumprocessors -c >>>> $modeldir/$lang1-$lang2-int-train.snt -o $modeldir/$lang1-$lang2 -s >>>> $modeldir/$lang2.vcb -t $modeldir/$lang1.vcb -coocurrencefile >>>> $modeldir/$lang2-$lang1.cooc -ml $ml -countincreasecutoff >>>> $countincreasecutoff -countincreasecutoffal $countincreasecutoffal >>>> -mincountincrease $mincountincrease -peggedcutoff $peggedcutoff -probcutoff >>>> $probcutoff -probsmooth $probsmooth -m1 $model1iterations -m2 >>>> $model2iterations -mh $hmmiterations -m3 $model3iterations -m4 >>>> $model4iterations -m5 $model5iterations -m6 $model6iterations -t1 >>>> $model1dumpfrequency -t2 $model2dumpfrequency -t2to3 $transferdumpfrequency >>>> -t345 $model345dumpfrequency -th $hmmdumpfrequency -onlyaldumps >>>> $onlyaldumps -nodumps $nodumps -compactadtable $compactadtable >>>> -model4smoothfactor $model4smoothfactor -compactalignmentformat >>>> $compactalignmentformat -verbose $verbose -verbosesentence $verbosesentence >>>> -emalsmooth $emalsmooth -model23smoothfactor $model23smoothfactor >>>> -model4smoothfactor $model4smoothfactor -model5smoothfactor >>>> $model5smoothfactor -nsmooth $nsmooth -nsmoothgeneral $nsmoothgeneral >>>> -deficientdistortionforemptyword $deficientdistortionforemptyword -depm4 >>>> $depm4 -depm5 $depm5 -emalignmentdependencies $emalignmentdependencies >>>> -emprobforempty $emprobforempty -m5p0 $m5p0 -manlexfactor1 $manlexfactor1 >>>> -manlexfactor2 $manlexfactor2 -manlexmaxmultiplicity $manlexmaxmultiplicity >>>> -maxfertility $maxfertility -p0 $p0 -pegging $pegging >>>> ****** phase 2.1 of training (merge alignments) >>>> Traceback (most recent call last): >>>> File "/home/Jelita/moses/tools/mgiza/scripts/merge_alignment.py", >>>> line 24, in <module> >>>> files.append(open(sys.argv[i],"r")); >>>> IOError: [Errno 2] No such file or directory: >>>> '/home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/id-en.A3.final.part*' >>>> Traceback (most recent call last): >>>> File "/home/Jelita/moses/tools/mgiza/scripts/merge_alignment.py", >>>> line 24, in <module> >>>> files.append(open(sys.argv[i],"r")); >>>> IOError: [Errno 2] No such file or directory: >>>> '/home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/en-id.A3.final.part*' >>>> ****** Rest of parallel training >>>> Using SCRIPTS_ROOTDIR: /home/Jelita/moses/tools/moses/scripts >>>> Using single-thread GIZA >>>> (3) generate word alignment @ Tue Nov 6 13:07:31 SEAST 2012 >>>> Combining forward and inverted alignment from files: >>>> >>>> /home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/id-en.A3.final.{bz2,gz} >>>> >>>> /home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/en-id.A3.final.{bz2,gz} >>>> Executing: mkdir -p >>>> /home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6 >>>> Executing: /home/Jelita/moses/tools/moses/scripts/training/symal/ >>>> giza2bal.pl <http://giza2bal.pl> -d "gzip -cd >>>> /home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/en-id.A3.final.gz" >>>> -i "gzip -cd >>>> /home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/id-en.A3.final.gz" >>>> |/home/Jelita/moses/tools/moses/scripts/training/symal/symal >>>> -alignment="grow" -diagonal="yes" -final="yes" -both="yes" > >>>> /home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/aligned.grow-diag-final-and >>>> >>>> >>>> symal: computing grow alignment: diagonal (1) final (1)both-uncovered >>>> (1) >>>> skip=<0> counts=<0> >>>> (4) generate lexical translation table 0-0 @ Tue Nov 6 13:07:31 SEAST >>>> 2012 >>>> (/home/Jelita/moses/corpora_trained/lc_clean/MinLen-1.MaxLen-60/ >>>> CleanAllCorpus15Oct2012.for_train.lowercase.id < >>>> http://CleanAllCorpus15Oct2012.for_train.lowercase.id>,/home/Jelita/moses/corpora_trained/lc_clean/MinLen-1.MaxLen-60/CleanAllCorpus15Oct2012.for_train.lowercase.en,/home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/lex) >>>> >>>> >>>> !Use of uninitialized value $a in scalar chomp at >>>> /home/Jelita/moses/tools/moses/scripts/training/train-model.perl line 1079. >>>> Use of uninitialized value $a in split at >>>> /home/Jelita/moses/tools/moses/scripts/training/train-model.perl line 1082. >>>> >>>> What is the cause? I use cygwin for Windows 7 on a 64-bit machine. >>>> I ran a few times and it can't get pass the Model 1 training. >>>> >>>> Thanks. >>>> >>>> Best regards, >>>> >>>> Jelita >>>> >>>> >>>> >>>> _______________________________________________ >>>> Moses-support mailing list >>>> [email protected] >>>> http://mailman.mit.edu/mailman/listinfo/moses-support >>>> >>> >>> >>> -- >>> The University of Edinburgh is a charitable body, registered in >>> Scotland, with registration number SC005336. >>> >>> >> > > > _______________________________________________ > Moses-support mailing > [email protected]http://mailman.mit.edu/mailman/listinfo/moses-support > > >
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
