MGIZA++ can be compiled using VC++ both 32 and 64bit, however 64bit version occasionally crashes during final clean up. So YMMV. Also, I believe mingw-w64 is out there, maybe you can try that (msys + mingw-w64)
http://stackoverflow.com/questions/9942923/mingw-as-a-reliable-64-bit-gcc-compiler --Q On Mon, Nov 12, 2012 at 8:14 AM, Kenneth Heafield <[email protected]>wrote: > Hi Jelita, > > mgiza claims to support native Windows via visual studio but it > appears > to only have been tested with 32-bit. You're on your own to try and get > it working in 64-bit windows and to integrate it into the scripts. > > Kenneth > > On 11/12/12 16:09, Philipp Koehn wrote: > > Hi, > > > > one (not completely satisfying) solution is to break up > > the corpus and run MGIZA++ separately on each part. > > > > -phi > > > > On Mon, Nov 12, 2012 at 4:56 AM, Jelita Asian > > <[email protected]> wrote: > >> Hi Barry, > >> > >> Actually how do we solve the more than 2 GB memory problem? Thanks. > >> > >> Best regards, > >> > >> Jelita > >> > >> > >> On Fri, Nov 9, 2012 at 10:51 AM, Jelita Asian< > [email protected]> > >> wrote: > >>> > >>> Hi Barry, > >>> > >>> Thanks. I will look into it now. > >>> > >>> Cheers, > >>> > >>> Jelita > >>> > >>> > >>> On Thu, Nov 8, 2012 at 10:09 PM, Barry Haddow< > [email protected]> > >>> wrote: > >>>> > >>>> Hi Jelita > >>>> > >>>> It could be running out of memory. Under cygwin, mgiza will be > limited to > >>>> 2GB > >>>> http://www.statmt.org/moses/?n=Moses.FAQ#ntoc9 > >>>> > >>>> cheers - Barry > >>>> > >>>> > >>>> On 06/11/12 07:33, Jelita Asian wrote: > >>>>> > >>>>> Hi, > >>>>> > >>>>> I run Moses training using moses-for-mere-mortal scripts. The run is > >>>>> used to be OK. However, since I increase the number of words (mostly > numbers > >>>>> written in words where words act as parallel sentences in corpus for > >>>>> Indonesian and English), I keep getting mgiza stack-dump, hence the > training > >>>>> is failed. > >>>>> > >>>>> Here is the extract for the log file of the run: > >>>>> > >>>>> ----------- > >>>>> Model1: Iteration 5 > >>>>> Reading more sentence pairs into memory ... > >>>>> [sent:100000] > >>>>> Reading more sentence pairs into memory ... > >>>>> Reading more sentence pairs into memory ... > >>>>> Reading more sentence pairs into memory ... > >>>>> Model1: (5) TRAIN CROSS-ENTROPY 5.82453 PERPLEXITY 56.6706 > >>>>> Model1: (5) VITERBI TRAIN CROSS-ENTROPY 6.59753 PERPLEXITY 96.8401 > >>>>> Model 1 Iteration: 5 took: 87 seconds > >>>>> Entire Model1 Training took: 444 seconds > >>>>> NOTE: I am doing iterations with the HMM model! > >>>>> Read classes: #words: 48562 #classes: 51 > >>>>> Actual number of read words: 48561 stored words: 48561 > >>>>> Read classes: #words: 45484 #classes: 51 > >>>>> Actual number of read words: 45483 stored words: 45483 > >>>>> > >>>>> ========================================================== > >>>>> Hmm Training Started at: Tue Nov 6 12:46:41 2012 > >>>>> > >>>>> ./train-AllCorpusIndo.sh: line 1184: 3936 Aborted > (core > >>>>> dumped) $toolsdir/mgiza/bin/mgiza -ncpus $mgizanumprocessors -c > >>>>> $modeldir/$lang2-$lang1-int-train.snt -o $modeldir/$lang2-$lang1 -s > >>>>> $modeldir/$lang1.vcb -t $modeldir/$lang2.vcb -coocurrencefile > >>>>> $modeldir/$lang1-$lang2.cooc -ml $ml -countincreasecutoff > >>>>> $countincreasecutoff -countincreasecutoffal $countincreasecutoffal > >>>>> -mincountincrease $mincountincrease -peggedcutoff $peggedcutoff > -probcutoff > >>>>> $probcutoff -probsmooth $probsmooth -m1 $model1iterations -m2 > >>>>> $model2iterations -mh $hmmiterations -m3 $model3iterations -m4 > >>>>> $model4iterations -m5 $model5iterations -m6 $model6iterations -t1 > >>>>> $model1dumpfrequency -t2 $model2dumpfrequency -t2to3 > $transferdumpfrequency > >>>>> -t345 $model345dumpfrequency -th $hmmdumpfrequency -onlyaldumps > $onlyaldumps > >>>>> -nodumps $nodumps -compactadtable $compactadtable -model4smoothfactor > >>>>> $model4smoothfactor -compactalignmentformat $compactalignmentformat > -verbose > >>>>> $verbose -verbosesentence $verbosesentence -emalsmooth $emalsmooth > >>>>> -model23smoothfactor $model23smoothfactor -model4smoothfactor > >>>>> $model4smoothfactor -model5smoothfactor $model5smoothfactor -nsmooth > >>>>> $nsmooth -nsmoothgeneral $nsmoothgeneral > -deficientdistortionforemptyword > >>>>> $deficientdistortionforemptyword -depm4 $depm4 -depm5 $depm5 > >>>>> -emalignmentdependencies $emalignmentdependencies -emprobforempty > >>>>> $emprobforempty -m5p0 $m5p0 -manlexfactor1 $manlexfactor1 > -manlexfactor2 > >>>>> $manlexfactor2 -manlexmaxmultiplicity $manlexmaxmultiplicity > -maxfertility > >>>>> $maxfertility -p0 $p0 -pegging $pegging > >>>>> Starting MGIZA > >>>>> Initializing Global Paras > >>>>> DEBUG: EnterDEBUG: PrefixDEBUG: LogParsing Arguments > >>>>> Parameter 'ncpus' changed from '2' to '8' > >>>>> Parameter 'c' changed from '' to > >>>>> > '/home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/id-en-int-train.snt' > >>>>> Parameter 'o' changed from '112-11-06.124815.Jelita' to > >>>>> > '/home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/id-en' > >>>>> Parameter 's' changed from '' to > >>>>> > '/home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/en.vcb' > >>>>> Parameter 't' changed from '' to > >>>>> > '/home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/id.vcb' > >>>>> Parameter 'coocurrencefile' changed from '' to > >>>>> > '/home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/en-id.cooc' > >>>>> Parameter 'm3' changed from '5' to '3' > >>>>> Parameter 'm4' changed from '5' to '3' > >>>>> Parameter 'onlyaldumps' changed from '0' to '1' > >>>>> Parameter 'nodumps' changed from '0' to '1' > >>>>> Parameter 'model4smoothfactor' changed from '0.2' to '0.4' > >>>>> Parameter 'nsmooth' changed from '64' to '4' > >>>>> Parameter 'p0' changed from '-1' to '0.999' > >>>>> general parameters: > >>>>> ------------------- > >>>>> ml = 101 (maximum sentence length) > >>>>> > >>>>> Here is another extract > >>>>> > >>>>> ./train-AllCorpusIndo.sh: line 1184: 2756 Aborted > (core > >>>>> dumped) $toolsdir/mgiza/bin/mgiza -ncpus $mgizanumprocessors -c > >>>>> $modeldir/$lang1-$lang2-int-train.snt -o $modeldir/$lang1-$lang2 -s > >>>>> $modeldir/$lang2.vcb -t $modeldir/$lang1.vcb -coocurrencefile > >>>>> $modeldir/$lang2-$lang1.cooc -ml $ml -countincreasecutoff > >>>>> $countincreasecutoff -countincreasecutoffal $countincreasecutoffal > >>>>> -mincountincrease $mincountincrease -peggedcutoff $peggedcutoff > -probcutoff > >>>>> $probcutoff -probsmooth $probsmooth -m1 $model1iterations -m2 > >>>>> $model2iterations -mh $hmmiterations -m3 $model3iterations -m4 > >>>>> $model4iterations -m5 $model5iterations -m6 $model6iterations -t1 > >>>>> $model1dumpfrequency -t2 $model2dumpfrequency -t2to3 > $transferdumpfrequency > >>>>> -t345 $model345dumpfrequency -th $hmmdumpfrequency -onlyaldumps > $onlyaldumps > >>>>> -nodumps $nodumps -compactadtable $compactadtable -model4smoothfactor > >>>>> $model4smoothfactor -compactalignmentformat $compactalignmentformat > -verbose > >>>>> $verbose -verbosesentence $verbosesentence -emalsmooth $emalsmooth > >>>>> -model23smoothfactor $model23smoothfactor -model4smoothfactor > >>>>> $model4smoothfactor -model5smoothfactor $model5smoothfactor -nsmooth > >>>>> $nsmooth -nsmoothgeneral $nsmoothgeneral > -deficientdistortionforemptyword > >>>>> $deficientdistortionforemptyword -depm4 $depm4 -depm5 $depm5 > >>>>> -emalignmentdependencies $emalignmentdependencies -emprobforempty > >>>>> $emprobforempty -m5p0 $m5p0 -manlexfactor1 $manlexfactor1 > -manlexfactor2 > >>>>> $manlexfactor2 -manlexmaxmultiplicity $manlexmaxmultiplicity > -maxfertility > >>>>> $maxfertility -p0 $p0 -pegging $pegging > >>>>> ****** phase 2.1 of training (merge alignments) > >>>>> Traceback (most recent call last): > >>>>> File "/home/Jelita/moses/tools/mgiza/scripts/merge_alignment.py", > line > >>>>> 24, in<module> > >>>>> files.append(open(sys.argv[i],"r")); > >>>>> IOError: [Errno 2] No such file or directory: > >>>>> > '/home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/id-en.A3.final.part*' > >>>>> Traceback (most recent call last): > >>>>> File "/home/Jelita/moses/tools/mgiza/scripts/merge_alignment.py", > line > >>>>> 24, in<module> > >>>>> files.append(open(sys.argv[i],"r")); > >>>>> IOError: [Errno 2] No such file or directory: > >>>>> > '/home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/en-id.A3.final.part*' > >>>>> ****** Rest of parallel training > >>>>> Using SCRIPTS_ROOTDIR: /home/Jelita/moses/tools/moses/scripts > >>>>> Using single-thread GIZA > >>>>> (3) generate word alignment @ Tue Nov 6 13:07:31 SEAST 2012 > >>>>> Combining forward and inverted alignment from files: > >>>>> > >>>>> > /home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/id-en.A3.final.{bz2,gz} > >>>>> > >>>>> > /home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/en-id.A3.final.{bz2,gz} > >>>>> Executing: mkdir -p > >>>>> > /home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6 > >>>>> Executing: > >>>>> /home/Jelita/moses/tools/moses/scripts/training/symal/giza2bal.pl > >>>>> <http://giza2bal.pl> -d "gzip -cd > >>>>> > /home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/en-id.A3.final.gz" > >>>>> -i "gzip -cd > >>>>> > /home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/id-en.A3.final.gz" > >>>>> |/home/Jelita/moses/tools/moses/scripts/training/symal/symal > >>>>> -alignment="grow" -diagonal="yes" -final="yes" -both="yes"> > >>>>> > /home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/aligned.grow-diag-final-and > >>>>> > >>>>> symal: computing grow alignment: diagonal (1) final (1)both-uncovered > >>>>> (1) > >>>>> skip=<0> counts=<0> > >>>>> (4) generate lexical translation table 0-0 @ Tue Nov 6 13:07:31 > SEAST > >>>>> 2012 > >>>>> > >>>>> (/home/Jelita/moses/corpora_trained/lc_clean/MinLen-1.MaxLen-60/ > CleanAllCorpus15Oct2012.for_train.lowercase.id > >>>>> <http://CleanAllCorpus15Oct2012.for_train.lowercase.id > >,/home/Jelita/moses/corpora_trained/lc_clean/MinLen-1.MaxLen-60/CleanAllCorpus15Oct2012.for_train.lowercase.en,/home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/lex) > >>>>> > >>>>> !Use of uninitialized value $a in scalar chomp at > >>>>> /home/Jelita/moses/tools/moses/scripts/training/train-model.perl > line 1079. > >>>>> Use of uninitialized value $a in split at > >>>>> /home/Jelita/moses/tools/moses/scripts/training/train-model.perl > line 1082. > >>>>> > >>>>> What is the cause? I use cygwin for Windows 7 on a 64-bit machine. > >>>>> I ran a few times and it can't get pass the Model 1 training. > >>>>> > >>>>> Thanks. > >>>>> > >>>>> Best regards, > >>>>> > >>>>> Jelita > >>>>> > >>>>> > >>>>> > >>>>> _______________________________________________ > >>>>> Moses-support mailing list > >>>>> [email protected] > >>>>> http://mailman.mit.edu/mailman/listinfo/moses-support > >>>> > >>>> > >>>> > >>>> -- > >>>> The University of Edinburgh is a charitable body, registered in > >>>> Scotland, with registration number SC005336. > >>>> > >>> > >> > >> > >> _______________________________________________ > >> Moses-support mailing list > >> [email protected] > >> http://mailman.mit.edu/mailman/listinfo/moses-support > >> > > > > _______________________________________________ > > Moses-support mailing list > > [email protected] > > http://mailman.mit.edu/mailman/listinfo/moses-support > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support >
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
