I don't know if Berkeley Aligner works under Windows, but since it's written in Java I strongly suspect that it would. If so, you could try doing the word alignment with it instead of mgiza.
Cheers, Lane On Mon, Nov 12, 2012 at 7:56 AM, Jelita Asian <[email protected]> wrote: > Hi Barry, > > Actually how do we solve the more than 2 GB memory problem? Thanks. > > Best regards, > > Jelita > > > On Fri, Nov 9, 2012 at 10:51 AM, Jelita Asian <[email protected]> > wrote: >> >> Hi Barry, >> >> Thanks. I will look into it now. >> >> Cheers, >> >> Jelita >> >> >> On Thu, Nov 8, 2012 at 10:09 PM, Barry Haddow <[email protected]> >> wrote: >>> >>> Hi Jelita >>> >>> It could be running out of memory. Under cygwin, mgiza will be limited to >>> 2GB >>> http://www.statmt.org/moses/?n=Moses.FAQ#ntoc9 >>> >>> cheers - Barry >>> >>> >>> On 06/11/12 07:33, Jelita Asian wrote: >>>> >>>> Hi, >>>> >>>> I run Moses training using moses-for-mere-mortal scripts. The run is >>>> used to be OK. However, since I increase the number of words (mostly >>>> numbers >>>> written in words where words act as parallel sentences in corpus for >>>> Indonesian and English), I keep getting mgiza stack-dump, hence the >>>> training >>>> is failed. >>>> >>>> Here is the extract for the log file of the run: >>>> >>>> ----------- >>>> Model1: Iteration 5 >>>> Reading more sentence pairs into memory ... >>>> [sent:100000] >>>> Reading more sentence pairs into memory ... >>>> Reading more sentence pairs into memory ... >>>> Reading more sentence pairs into memory ... >>>> Model1: (5) TRAIN CROSS-ENTROPY 5.82453 PERPLEXITY 56.6706 >>>> Model1: (5) VITERBI TRAIN CROSS-ENTROPY 6.59753 PERPLEXITY 96.8401 >>>> Model 1 Iteration: 5 took: 87 seconds >>>> Entire Model1 Training took: 444 seconds >>>> NOTE: I am doing iterations with the HMM model! >>>> Read classes: #words: 48562 #classes: 51 >>>> Actual number of read words: 48561 stored words: 48561 >>>> Read classes: #words: 45484 #classes: 51 >>>> Actual number of read words: 45483 stored words: 45483 >>>> >>>> ========================================================== >>>> Hmm Training Started at: Tue Nov 6 12:46:41 2012 >>>> >>>> ./train-AllCorpusIndo.sh: line 1184: 3936 Aborted (core >>>> dumped) $toolsdir/mgiza/bin/mgiza -ncpus $mgizanumprocessors -c >>>> $modeldir/$lang2-$lang1-int-train.snt -o $modeldir/$lang2-$lang1 -s >>>> $modeldir/$lang1.vcb -t $modeldir/$lang2.vcb -coocurrencefile >>>> $modeldir/$lang1-$lang2.cooc -ml $ml -countincreasecutoff >>>> $countincreasecutoff -countincreasecutoffal $countincreasecutoffal >>>> -mincountincrease $mincountincrease -peggedcutoff $peggedcutoff -probcutoff >>>> $probcutoff -probsmooth $probsmooth -m1 $model1iterations -m2 >>>> $model2iterations -mh $hmmiterations -m3 $model3iterations -m4 >>>> $model4iterations -m5 $model5iterations -m6 $model6iterations -t1 >>>> $model1dumpfrequency -t2 $model2dumpfrequency -t2to3 $transferdumpfrequency >>>> -t345 $model345dumpfrequency -th $hmmdumpfrequency -onlyaldumps >>>> $onlyaldumps >>>> -nodumps $nodumps -compactadtable $compactadtable -model4smoothfactor >>>> $model4smoothfactor -compactalignmentformat $compactalignmentformat >>>> -verbose >>>> $verbose -verbosesentence $verbosesentence -emalsmooth $emalsmooth >>>> -model23smoothfactor $model23smoothfactor -model4smoothfactor >>>> $model4smoothfactor -model5smoothfactor $model5smoothfactor -nsmooth >>>> $nsmooth -nsmoothgeneral $nsmoothgeneral -deficientdistortionforemptyword >>>> $deficientdistortionforemptyword -depm4 $depm4 -depm5 $depm5 >>>> -emalignmentdependencies $emalignmentdependencies -emprobforempty >>>> $emprobforempty -m5p0 $m5p0 -manlexfactor1 $manlexfactor1 -manlexfactor2 >>>> $manlexfactor2 -manlexmaxmultiplicity $manlexmaxmultiplicity -maxfertility >>>> $maxfertility -p0 $p0 -pegging $pegging >>>> Starting MGIZA >>>> Initializing Global Paras >>>> DEBUG: EnterDEBUG: PrefixDEBUG: LogParsing Arguments >>>> Parameter 'ncpus' changed from '2' to '8' >>>> Parameter 'c' changed from '' to >>>> '/home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/id-en-int-train.snt' >>>> Parameter 'o' changed from '112-11-06.124815.Jelita' to >>>> '/home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/id-en' >>>> Parameter 's' changed from '' to >>>> '/home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/en.vcb' >>>> Parameter 't' changed from '' to >>>> '/home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/id.vcb' >>>> Parameter 'coocurrencefile' changed from '' to >>>> '/home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/en-id.cooc' >>>> Parameter 'm3' changed from '5' to '3' >>>> Parameter 'm4' changed from '5' to '3' >>>> Parameter 'onlyaldumps' changed from '0' to '1' >>>> Parameter 'nodumps' changed from '0' to '1' >>>> Parameter 'model4smoothfactor' changed from '0.2' to '0.4' >>>> Parameter 'nsmooth' changed from '64' to '4' >>>> Parameter 'p0' changed from '-1' to '0.999' >>>> general parameters: >>>> ------------------- >>>> ml = 101 (maximum sentence length) >>>> >>>> Here is another extract >>>> >>>> ./train-AllCorpusIndo.sh: line 1184: 2756 Aborted (core >>>> dumped) $toolsdir/mgiza/bin/mgiza -ncpus $mgizanumprocessors -c >>>> $modeldir/$lang1-$lang2-int-train.snt -o $modeldir/$lang1-$lang2 -s >>>> $modeldir/$lang2.vcb -t $modeldir/$lang1.vcb -coocurrencefile >>>> $modeldir/$lang2-$lang1.cooc -ml $ml -countincreasecutoff >>>> $countincreasecutoff -countincreasecutoffal $countincreasecutoffal >>>> -mincountincrease $mincountincrease -peggedcutoff $peggedcutoff -probcutoff >>>> $probcutoff -probsmooth $probsmooth -m1 $model1iterations -m2 >>>> $model2iterations -mh $hmmiterations -m3 $model3iterations -m4 >>>> $model4iterations -m5 $model5iterations -m6 $model6iterations -t1 >>>> $model1dumpfrequency -t2 $model2dumpfrequency -t2to3 $transferdumpfrequency >>>> -t345 $model345dumpfrequency -th $hmmdumpfrequency -onlyaldumps >>>> $onlyaldumps >>>> -nodumps $nodumps -compactadtable $compactadtable -model4smoothfactor >>>> $model4smoothfactor -compactalignmentformat $compactalignmentformat >>>> -verbose >>>> $verbose -verbosesentence $verbosesentence -emalsmooth $emalsmooth >>>> -model23smoothfactor $model23smoothfactor -model4smoothfactor >>>> $model4smoothfactor -model5smoothfactor $model5smoothfactor -nsmooth >>>> $nsmooth -nsmoothgeneral $nsmoothgeneral -deficientdistortionforemptyword >>>> $deficientdistortionforemptyword -depm4 $depm4 -depm5 $depm5 >>>> -emalignmentdependencies $emalignmentdependencies -emprobforempty >>>> $emprobforempty -m5p0 $m5p0 -manlexfactor1 $manlexfactor1 -manlexfactor2 >>>> $manlexfactor2 -manlexmaxmultiplicity $manlexmaxmultiplicity -maxfertility >>>> $maxfertility -p0 $p0 -pegging $pegging >>>> ****** phase 2.1 of training (merge alignments) >>>> Traceback (most recent call last): >>>> File "/home/Jelita/moses/tools/mgiza/scripts/merge_alignment.py", line >>>> 24, in <module> >>>> files.append(open(sys.argv[i],"r")); >>>> IOError: [Errno 2] No such file or directory: >>>> '/home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/id-en.A3.final.part*' >>>> Traceback (most recent call last): >>>> File "/home/Jelita/moses/tools/mgiza/scripts/merge_alignment.py", line >>>> 24, in <module> >>>> files.append(open(sys.argv[i],"r")); >>>> IOError: [Errno 2] No such file or directory: >>>> '/home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/en-id.A3.final.part*' >>>> ****** Rest of parallel training >>>> Using SCRIPTS_ROOTDIR: /home/Jelita/moses/tools/moses/scripts >>>> Using single-thread GIZA >>>> (3) generate word alignment @ Tue Nov 6 13:07:31 SEAST 2012 >>>> Combining forward and inverted alignment from files: >>>> >>>> /home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/id-en.A3.final.{bz2,gz} >>>> >>>> /home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/en-id.A3.final.{bz2,gz} >>>> Executing: mkdir -p >>>> /home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6 >>>> Executing: >>>> /home/Jelita/moses/tools/moses/scripts/training/symal/giza2bal.pl >>>> <http://giza2bal.pl> -d "gzip -cd >>>> /home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/en-id.A3.final.gz" >>>> -i "gzip -cd >>>> /home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/id-en.A3.final.gz" >>>> |/home/Jelita/moses/tools/moses/scripts/training/symal/symal >>>> -alignment="grow" -diagonal="yes" -final="yes" -both="yes" > >>>> /home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/aligned.grow-diag-final-and >>>> >>>> symal: computing grow alignment: diagonal (1) final (1)both-uncovered >>>> (1) >>>> skip=<0> counts=<0> >>>> (4) generate lexical translation table 0-0 @ Tue Nov 6 13:07:31 SEAST >>>> 2012 >>>> >>>> (/home/Jelita/moses/corpora_trained/lc_clean/MinLen-1.MaxLen-60/CleanAllCorpus15Oct2012.for_train.lowercase.id >>>> <http://CleanAllCorpus15Oct2012.for_train.lowercase.id>,/home/Jelita/moses/corpora_trained/lc_clean/MinLen-1.MaxLen-60/CleanAllCorpus15Oct2012.for_train.lowercase.en,/home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/lex) >>>> >>>> !Use of uninitialized value $a in scalar chomp at >>>> /home/Jelita/moses/tools/moses/scripts/training/train-model.perl line 1079. >>>> Use of uninitialized value $a in split at >>>> /home/Jelita/moses/tools/moses/scripts/training/train-model.perl line 1082. >>>> >>>> What is the cause? I use cygwin for Windows 7 on a 64-bit machine. >>>> I ran a few times and it can't get pass the Model 1 training. >>>> >>>> Thanks. >>>> >>>> Best regards, >>>> >>>> Jelita >>>> >>>> >>>> >>>> _______________________________________________ >>>> Moses-support mailing list >>>> [email protected] >>>> http://mailman.mit.edu/mailman/listinfo/moses-support >>> >>> >>> >>> -- >>> The University of Edinburgh is a charitable body, registered in >>> Scotland, with registration number SC005336. >>> >> > > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support > -- When a place gets crowded enough to require ID's, social collapse is not far away. It is time to go elsewhere. The best thing about space travel is that it made it possible to go elsewhere. -- R.A. Heinlein, "Time Enough For Love" _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
