MGIZA++ can be compiled using VC++ both 32 and 64bit, however 64bit version
occasionally crashes during final clean up. So YMMV. Also, I believe
mingw-w64 is out there, maybe you can try that (msys + mingw-w64)

http://stackoverflow.com/questions/9942923/mingw-as-a-reliable-64-bit-gcc-compiler


--Q



On Mon, Nov 12, 2012 at 8:14 AM, Kenneth Heafield <[email protected]>wrote:

> Hi Jelita,
>
>         mgiza claims to support native Windows via visual studio but it
> appears
> to only have been tested with 32-bit.  You're on your own to try and get
> it working in 64-bit windows and to integrate it into the scripts.
>
> Kenneth
>
> On 11/12/12 16:09, Philipp Koehn wrote:
> > Hi,
> >
> > one (not completely satisfying) solution is to break up
> > the corpus and run MGIZA++ separately on each part.
> >
> > -phi
> >
> > On Mon, Nov 12, 2012 at 4:56 AM, Jelita Asian
> > <[email protected]>  wrote:
> >> Hi Barry,
> >>
> >> Actually how do we solve the more than 2 GB memory problem? Thanks.
> >>
> >> Best regards,
> >>
> >> Jelita
> >>
> >>
> >> On Fri, Nov 9, 2012 at 10:51 AM, Jelita Asian<
> [email protected]>
> >> wrote:
> >>>
> >>> Hi Barry,
> >>>
> >>> Thanks. I will look into it now.
> >>>
> >>> Cheers,
> >>>
> >>> Jelita
> >>>
> >>>
> >>> On Thu, Nov 8, 2012 at 10:09 PM, Barry Haddow<
> [email protected]>
> >>> wrote:
> >>>>
> >>>> Hi Jelita
> >>>>
> >>>> It could be running out of memory. Under cygwin, mgiza will be
> limited to
> >>>> 2GB
> >>>> http://www.statmt.org/moses/?n=Moses.FAQ#ntoc9
> >>>>
> >>>> cheers - Barry
> >>>>
> >>>>
> >>>> On 06/11/12 07:33, Jelita Asian wrote:
> >>>>>
> >>>>> Hi,
> >>>>>
> >>>>> I run Moses training using moses-for-mere-mortal scripts. The run is
> >>>>> used to be OK. However, since I increase the number of words (mostly
> numbers
> >>>>> written in words where words act as parallel sentences in corpus for
> >>>>> Indonesian and English), I keep getting mgiza stack-dump, hence the
> training
> >>>>> is failed.
> >>>>>
> >>>>> Here is the extract for the log file of the run:
> >>>>>
> >>>>> -----------
> >>>>> Model1: Iteration 5
> >>>>> Reading more sentence pairs into memory ...
> >>>>> [sent:100000]
> >>>>> Reading more sentence pairs into memory ...
> >>>>> Reading more sentence pairs into memory ...
> >>>>> Reading more sentence pairs into memory ...
> >>>>> Model1: (5) TRAIN CROSS-ENTROPY 5.82453 PERPLEXITY 56.6706
> >>>>> Model1: (5) VITERBI TRAIN CROSS-ENTROPY 6.59753 PERPLEXITY 96.8401
> >>>>> Model 1 Iteration: 5 took: 87 seconds
> >>>>> Entire Model1 Training took: 444 seconds
> >>>>> NOTE: I am doing iterations with the HMM model!
> >>>>> Read classes: #words: 48562  #classes: 51
> >>>>> Actual number of read words: 48561 stored words: 48561
> >>>>> Read classes: #words: 45484  #classes: 51
> >>>>> Actual number of read words: 45483 stored words: 45483
> >>>>>
> >>>>> ==========================================================
> >>>>> Hmm Training Started at: Tue Nov  6 12:46:41 2012
> >>>>>
> >>>>> ./train-AllCorpusIndo.sh: line 1184:  3936 Aborted
> (core
> >>>>> dumped) $toolsdir/mgiza/bin/mgiza -ncpus $mgizanumprocessors -c
> >>>>> $modeldir/$lang2-$lang1-int-train.snt -o $modeldir/$lang2-$lang1 -s
> >>>>> $modeldir/$lang1.vcb -t $modeldir/$lang2.vcb -coocurrencefile
> >>>>> $modeldir/$lang1-$lang2.cooc -ml $ml -countincreasecutoff
> >>>>> $countincreasecutoff -countincreasecutoffal $countincreasecutoffal
> >>>>> -mincountincrease $mincountincrease -peggedcutoff $peggedcutoff
> -probcutoff
> >>>>> $probcutoff -probsmooth $probsmooth -m1 $model1iterations -m2
> >>>>> $model2iterations -mh $hmmiterations -m3 $model3iterations -m4
> >>>>> $model4iterations -m5 $model5iterations -m6 $model6iterations -t1
> >>>>> $model1dumpfrequency -t2 $model2dumpfrequency -t2to3
> $transferdumpfrequency
> >>>>> -t345 $model345dumpfrequency -th $hmmdumpfrequency -onlyaldumps
> $onlyaldumps
> >>>>> -nodumps $nodumps -compactadtable $compactadtable -model4smoothfactor
> >>>>> $model4smoothfactor -compactalignmentformat $compactalignmentformat
> -verbose
> >>>>> $verbose -verbosesentence $verbosesentence -emalsmooth $emalsmooth
> >>>>> -model23smoothfactor $model23smoothfactor -model4smoothfactor
> >>>>> $model4smoothfactor -model5smoothfactor $model5smoothfactor -nsmooth
> >>>>> $nsmooth -nsmoothgeneral $nsmoothgeneral
> -deficientdistortionforemptyword
> >>>>> $deficientdistortionforemptyword -depm4 $depm4 -depm5 $depm5
> >>>>> -emalignmentdependencies $emalignmentdependencies -emprobforempty
> >>>>> $emprobforempty -m5p0 $m5p0 -manlexfactor1 $manlexfactor1
> -manlexfactor2
> >>>>> $manlexfactor2 -manlexmaxmultiplicity $manlexmaxmultiplicity
> -maxfertility
> >>>>> $maxfertility -p0 $p0 -pegging $pegging
> >>>>> Starting MGIZA
> >>>>> Initializing Global Paras
> >>>>> DEBUG: EnterDEBUG: PrefixDEBUG: LogParsing Arguments
> >>>>> Parameter 'ncpus' changed from '2' to '8'
> >>>>> Parameter 'c' changed from '' to
> >>>>>
> '/home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/id-en-int-train.snt'
> >>>>> Parameter 'o' changed from '112-11-06.124815.Jelita' to
> >>>>>
> '/home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/id-en'
> >>>>> Parameter 's' changed from '' to
> >>>>>
> '/home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/en.vcb'
> >>>>> Parameter 't' changed from '' to
> >>>>>
> '/home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/id.vcb'
> >>>>> Parameter 'coocurrencefile' changed from '' to
> >>>>>
> '/home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/en-id.cooc'
> >>>>> Parameter 'm3' changed from '5' to '3'
> >>>>> Parameter 'm4' changed from '5' to '3'
> >>>>> Parameter 'onlyaldumps' changed from '0' to '1'
> >>>>> Parameter 'nodumps' changed from '0' to '1'
> >>>>> Parameter 'model4smoothfactor' changed from '0.2' to '0.4'
> >>>>> Parameter 'nsmooth' changed from '64' to '4'
> >>>>> Parameter 'p0' changed from '-1' to '0.999'
> >>>>> general parameters:
> >>>>> -------------------
> >>>>> ml = 101  (maximum sentence length)
> >>>>>
> >>>>> Here is another extract
> >>>>>
> >>>>> ./train-AllCorpusIndo.sh: line 1184:  2756 Aborted
> (core
> >>>>> dumped) $toolsdir/mgiza/bin/mgiza -ncpus $mgizanumprocessors -c
> >>>>> $modeldir/$lang1-$lang2-int-train.snt -o $modeldir/$lang1-$lang2 -s
> >>>>> $modeldir/$lang2.vcb -t $modeldir/$lang1.vcb -coocurrencefile
> >>>>> $modeldir/$lang2-$lang1.cooc -ml $ml -countincreasecutoff
> >>>>> $countincreasecutoff -countincreasecutoffal $countincreasecutoffal
> >>>>> -mincountincrease $mincountincrease -peggedcutoff $peggedcutoff
> -probcutoff
> >>>>> $probcutoff -probsmooth $probsmooth -m1 $model1iterations -m2
> >>>>> $model2iterations -mh $hmmiterations -m3 $model3iterations -m4
> >>>>> $model4iterations -m5 $model5iterations -m6 $model6iterations -t1
> >>>>> $model1dumpfrequency -t2 $model2dumpfrequency -t2to3
> $transferdumpfrequency
> >>>>> -t345 $model345dumpfrequency -th $hmmdumpfrequency -onlyaldumps
> $onlyaldumps
> >>>>> -nodumps $nodumps -compactadtable $compactadtable -model4smoothfactor
> >>>>> $model4smoothfactor -compactalignmentformat $compactalignmentformat
> -verbose
> >>>>> $verbose -verbosesentence $verbosesentence -emalsmooth $emalsmooth
> >>>>> -model23smoothfactor $model23smoothfactor -model4smoothfactor
> >>>>> $model4smoothfactor -model5smoothfactor $model5smoothfactor -nsmooth
> >>>>> $nsmooth -nsmoothgeneral $nsmoothgeneral
> -deficientdistortionforemptyword
> >>>>> $deficientdistortionforemptyword -depm4 $depm4 -depm5 $depm5
> >>>>> -emalignmentdependencies $emalignmentdependencies -emprobforempty
> >>>>> $emprobforempty -m5p0 $m5p0 -manlexfactor1 $manlexfactor1
> -manlexfactor2
> >>>>> $manlexfactor2 -manlexmaxmultiplicity $manlexmaxmultiplicity
> -maxfertility
> >>>>> $maxfertility -p0 $p0 -pegging $pegging
> >>>>> ****** phase 2.1 of training (merge alignments)
> >>>>> Traceback (most recent call last):
> >>>>>    File "/home/Jelita/moses/tools/mgiza/scripts/merge_alignment.py",
> line
> >>>>> 24, in<module>
> >>>>>      files.append(open(sys.argv[i],"r"));
> >>>>> IOError: [Errno 2] No such file or directory:
> >>>>>
> '/home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/id-en.A3.final.part*'
> >>>>> Traceback (most recent call last):
> >>>>>    File "/home/Jelita/moses/tools/mgiza/scripts/merge_alignment.py",
> line
> >>>>> 24, in<module>
> >>>>>      files.append(open(sys.argv[i],"r"));
> >>>>> IOError: [Errno 2] No such file or directory:
> >>>>>
> '/home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/en-id.A3.final.part*'
> >>>>> ****** Rest of parallel training
> >>>>> Using SCRIPTS_ROOTDIR: /home/Jelita/moses/tools/moses/scripts
> >>>>> Using single-thread GIZA
> >>>>> (3) generate word alignment @ Tue Nov  6 13:07:31 SEAST 2012
> >>>>> Combining forward and inverted alignment from files:
> >>>>>
> >>>>>
> /home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/id-en.A3.final.{bz2,gz}
> >>>>>
> >>>>>
> /home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/en-id.A3.final.{bz2,gz}
> >>>>> Executing: mkdir -p
> >>>>>
> /home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6
> >>>>> Executing:
> >>>>> /home/Jelita/moses/tools/moses/scripts/training/symal/giza2bal.pl
> >>>>> <http://giza2bal.pl>  -d "gzip -cd
> >>>>>
> /home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/en-id.A3.final.gz"
> >>>>> -i "gzip -cd
> >>>>>
> /home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/id-en.A3.final.gz"
> >>>>> |/home/Jelita/moses/tools/moses/scripts/training/symal/symal
> >>>>> -alignment="grow" -diagonal="yes" -final="yes" -both="yes">
> >>>>>
> /home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/aligned.grow-diag-final-and
> >>>>>
> >>>>> symal: computing grow alignment: diagonal (1) final (1)both-uncovered
> >>>>> (1)
> >>>>> skip=<0>  counts=<0>
> >>>>> (4) generate lexical translation table 0-0 @ Tue Nov  6 13:07:31
> SEAST
> >>>>> 2012
> >>>>>
> >>>>> (/home/Jelita/moses/corpora_trained/lc_clean/MinLen-1.MaxLen-60/
> CleanAllCorpus15Oct2012.for_train.lowercase.id
> >>>>> <http://CleanAllCorpus15Oct2012.for_train.lowercase.id
> >,/home/Jelita/moses/corpora_trained/lc_clean/MinLen-1.MaxLen-60/CleanAllCorpus15Oct2012.for_train.lowercase.en,/home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/lex)
> >>>>>
> >>>>> !Use of uninitialized value $a in scalar chomp at
> >>>>> /home/Jelita/moses/tools/moses/scripts/training/train-model.perl
> line 1079.
> >>>>> Use of uninitialized value $a in split at
> >>>>> /home/Jelita/moses/tools/moses/scripts/training/train-model.perl
> line 1082.
> >>>>>
> >>>>> What is the cause? I use cygwin for Windows 7 on a 64-bit machine.
> >>>>> I ran a few times and it can't get pass the Model 1 training.
> >>>>>
> >>>>> Thanks.
> >>>>>
> >>>>> Best regards,
> >>>>>
> >>>>> Jelita
> >>>>>
> >>>>>
> >>>>>
> >>>>> _______________________________________________
> >>>>> Moses-support mailing list
> >>>>> [email protected]
> >>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> The University of Edinburgh is a charitable body, registered in
> >>>> Scotland, with registration number SC005336.
> >>>>
> >>>
> >>
> >>
> >> _______________________________________________
> >> Moses-support mailing list
> >> [email protected]
> >> http://mailman.mit.edu/mailman/listinfo/moses-support
> >>
> >
> > _______________________________________________
> > Moses-support mailing list
> > [email protected]
> > http://mailman.mit.edu/mailman/listinfo/moses-support
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to