Hi Jelita,
mgiza claims to support native Windows via visual studio but it appears
to only have been tested with 32-bit. You're on your own to try and get
it working in 64-bit windows and to integrate it into the scripts.
Kenneth
On 11/12/12 16:09, Philipp Koehn wrote:
> Hi,
>
> one (not completely satisfying) solution is to break up
> the corpus and run MGIZA++ separately on each part.
>
> -phi
>
> On Mon, Nov 12, 2012 at 4:56 AM, Jelita Asian
> <[email protected]> wrote:
>> Hi Barry,
>>
>> Actually how do we solve the more than 2 GB memory problem? Thanks.
>>
>> Best regards,
>>
>> Jelita
>>
>>
>> On Fri, Nov 9, 2012 at 10:51 AM, Jelita Asian<[email protected]>
>> wrote:
>>>
>>> Hi Barry,
>>>
>>> Thanks. I will look into it now.
>>>
>>> Cheers,
>>>
>>> Jelita
>>>
>>>
>>> On Thu, Nov 8, 2012 at 10:09 PM, Barry Haddow<[email protected]>
>>> wrote:
>>>>
>>>> Hi Jelita
>>>>
>>>> It could be running out of memory. Under cygwin, mgiza will be limited to
>>>> 2GB
>>>> http://www.statmt.org/moses/?n=Moses.FAQ#ntoc9
>>>>
>>>> cheers - Barry
>>>>
>>>>
>>>> On 06/11/12 07:33, Jelita Asian wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> I run Moses training using moses-for-mere-mortal scripts. The run is
>>>>> used to be OK. However, since I increase the number of words (mostly
>>>>> numbers
>>>>> written in words where words act as parallel sentences in corpus for
>>>>> Indonesian and English), I keep getting mgiza stack-dump, hence the
>>>>> training
>>>>> is failed.
>>>>>
>>>>> Here is the extract for the log file of the run:
>>>>>
>>>>> -----------
>>>>> Model1: Iteration 5
>>>>> Reading more sentence pairs into memory ...
>>>>> [sent:100000]
>>>>> Reading more sentence pairs into memory ...
>>>>> Reading more sentence pairs into memory ...
>>>>> Reading more sentence pairs into memory ...
>>>>> Model1: (5) TRAIN CROSS-ENTROPY 5.82453 PERPLEXITY 56.6706
>>>>> Model1: (5) VITERBI TRAIN CROSS-ENTROPY 6.59753 PERPLEXITY 96.8401
>>>>> Model 1 Iteration: 5 took: 87 seconds
>>>>> Entire Model1 Training took: 444 seconds
>>>>> NOTE: I am doing iterations with the HMM model!
>>>>> Read classes: #words: 48562 #classes: 51
>>>>> Actual number of read words: 48561 stored words: 48561
>>>>> Read classes: #words: 45484 #classes: 51
>>>>> Actual number of read words: 45483 stored words: 45483
>>>>>
>>>>> ==========================================================
>>>>> Hmm Training Started at: Tue Nov 6 12:46:41 2012
>>>>>
>>>>> ./train-AllCorpusIndo.sh: line 1184: 3936 Aborted (core
>>>>> dumped) $toolsdir/mgiza/bin/mgiza -ncpus $mgizanumprocessors -c
>>>>> $modeldir/$lang2-$lang1-int-train.snt -o $modeldir/$lang2-$lang1 -s
>>>>> $modeldir/$lang1.vcb -t $modeldir/$lang2.vcb -coocurrencefile
>>>>> $modeldir/$lang1-$lang2.cooc -ml $ml -countincreasecutoff
>>>>> $countincreasecutoff -countincreasecutoffal $countincreasecutoffal
>>>>> -mincountincrease $mincountincrease -peggedcutoff $peggedcutoff
>>>>> -probcutoff
>>>>> $probcutoff -probsmooth $probsmooth -m1 $model1iterations -m2
>>>>> $model2iterations -mh $hmmiterations -m3 $model3iterations -m4
>>>>> $model4iterations -m5 $model5iterations -m6 $model6iterations -t1
>>>>> $model1dumpfrequency -t2 $model2dumpfrequency -t2to3
>>>>> $transferdumpfrequency
>>>>> -t345 $model345dumpfrequency -th $hmmdumpfrequency -onlyaldumps
>>>>> $onlyaldumps
>>>>> -nodumps $nodumps -compactadtable $compactadtable -model4smoothfactor
>>>>> $model4smoothfactor -compactalignmentformat $compactalignmentformat
>>>>> -verbose
>>>>> $verbose -verbosesentence $verbosesentence -emalsmooth $emalsmooth
>>>>> -model23smoothfactor $model23smoothfactor -model4smoothfactor
>>>>> $model4smoothfactor -model5smoothfactor $model5smoothfactor -nsmooth
>>>>> $nsmooth -nsmoothgeneral $nsmoothgeneral -deficientdistortionforemptyword
>>>>> $deficientdistortionforemptyword -depm4 $depm4 -depm5 $depm5
>>>>> -emalignmentdependencies $emalignmentdependencies -emprobforempty
>>>>> $emprobforempty -m5p0 $m5p0 -manlexfactor1 $manlexfactor1 -manlexfactor2
>>>>> $manlexfactor2 -manlexmaxmultiplicity $manlexmaxmultiplicity -maxfertility
>>>>> $maxfertility -p0 $p0 -pegging $pegging
>>>>> Starting MGIZA
>>>>> Initializing Global Paras
>>>>> DEBUG: EnterDEBUG: PrefixDEBUG: LogParsing Arguments
>>>>> Parameter 'ncpus' changed from '2' to '8'
>>>>> Parameter 'c' changed from '' to
>>>>> '/home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/id-en-int-train.snt'
>>>>> Parameter 'o' changed from '112-11-06.124815.Jelita' to
>>>>> '/home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/id-en'
>>>>> Parameter 's' changed from '' to
>>>>> '/home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/en.vcb'
>>>>> Parameter 't' changed from '' to
>>>>> '/home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/id.vcb'
>>>>> Parameter 'coocurrencefile' changed from '' to
>>>>> '/home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/en-id.cooc'
>>>>> Parameter 'm3' changed from '5' to '3'
>>>>> Parameter 'm4' changed from '5' to '3'
>>>>> Parameter 'onlyaldumps' changed from '0' to '1'
>>>>> Parameter 'nodumps' changed from '0' to '1'
>>>>> Parameter 'model4smoothfactor' changed from '0.2' to '0.4'
>>>>> Parameter 'nsmooth' changed from '64' to '4'
>>>>> Parameter 'p0' changed from '-1' to '0.999'
>>>>> general parameters:
>>>>> -------------------
>>>>> ml = 101 (maximum sentence length)
>>>>>
>>>>> Here is another extract
>>>>>
>>>>> ./train-AllCorpusIndo.sh: line 1184: 2756 Aborted (core
>>>>> dumped) $toolsdir/mgiza/bin/mgiza -ncpus $mgizanumprocessors -c
>>>>> $modeldir/$lang1-$lang2-int-train.snt -o $modeldir/$lang1-$lang2 -s
>>>>> $modeldir/$lang2.vcb -t $modeldir/$lang1.vcb -coocurrencefile
>>>>> $modeldir/$lang2-$lang1.cooc -ml $ml -countincreasecutoff
>>>>> $countincreasecutoff -countincreasecutoffal $countincreasecutoffal
>>>>> -mincountincrease $mincountincrease -peggedcutoff $peggedcutoff
>>>>> -probcutoff
>>>>> $probcutoff -probsmooth $probsmooth -m1 $model1iterations -m2
>>>>> $model2iterations -mh $hmmiterations -m3 $model3iterations -m4
>>>>> $model4iterations -m5 $model5iterations -m6 $model6iterations -t1
>>>>> $model1dumpfrequency -t2 $model2dumpfrequency -t2to3
>>>>> $transferdumpfrequency
>>>>> -t345 $model345dumpfrequency -th $hmmdumpfrequency -onlyaldumps
>>>>> $onlyaldumps
>>>>> -nodumps $nodumps -compactadtable $compactadtable -model4smoothfactor
>>>>> $model4smoothfactor -compactalignmentformat $compactalignmentformat
>>>>> -verbose
>>>>> $verbose -verbosesentence $verbosesentence -emalsmooth $emalsmooth
>>>>> -model23smoothfactor $model23smoothfactor -model4smoothfactor
>>>>> $model4smoothfactor -model5smoothfactor $model5smoothfactor -nsmooth
>>>>> $nsmooth -nsmoothgeneral $nsmoothgeneral -deficientdistortionforemptyword
>>>>> $deficientdistortionforemptyword -depm4 $depm4 -depm5 $depm5
>>>>> -emalignmentdependencies $emalignmentdependencies -emprobforempty
>>>>> $emprobforempty -m5p0 $m5p0 -manlexfactor1 $manlexfactor1 -manlexfactor2
>>>>> $manlexfactor2 -manlexmaxmultiplicity $manlexmaxmultiplicity -maxfertility
>>>>> $maxfertility -p0 $p0 -pegging $pegging
>>>>> ****** phase 2.1 of training (merge alignments)
>>>>> Traceback (most recent call last):
>>>>> File "/home/Jelita/moses/tools/mgiza/scripts/merge_alignment.py", line
>>>>> 24, in<module>
>>>>> files.append(open(sys.argv[i],"r"));
>>>>> IOError: [Errno 2] No such file or directory:
>>>>> '/home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/id-en.A3.final.part*'
>>>>> Traceback (most recent call last):
>>>>> File "/home/Jelita/moses/tools/mgiza/scripts/merge_alignment.py", line
>>>>> 24, in<module>
>>>>> files.append(open(sys.argv[i],"r"));
>>>>> IOError: [Errno 2] No such file or directory:
>>>>> '/home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/en-id.A3.final.part*'
>>>>> ****** Rest of parallel training
>>>>> Using SCRIPTS_ROOTDIR: /home/Jelita/moses/tools/moses/scripts
>>>>> Using single-thread GIZA
>>>>> (3) generate word alignment @ Tue Nov 6 13:07:31 SEAST 2012
>>>>> Combining forward and inverted alignment from files:
>>>>>
>>>>> /home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/id-en.A3.final.{bz2,gz}
>>>>>
>>>>> /home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/en-id.A3.final.{bz2,gz}
>>>>> Executing: mkdir -p
>>>>> /home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6
>>>>> Executing:
>>>>> /home/Jelita/moses/tools/moses/scripts/training/symal/giza2bal.pl
>>>>> <http://giza2bal.pl> -d "gzip -cd
>>>>> /home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/en-id.A3.final.gz"
>>>>> -i "gzip -cd
>>>>> /home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/id-en.A3.final.gz"
>>>>> |/home/Jelita/moses/tools/moses/scripts/training/symal/symal
>>>>> -alignment="grow" -diagonal="yes" -final="yes" -both="yes">
>>>>> /home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/aligned.grow-diag-final-and
>>>>>
>>>>> symal: computing grow alignment: diagonal (1) final (1)both-uncovered
>>>>> (1)
>>>>> skip=<0> counts=<0>
>>>>> (4) generate lexical translation table 0-0 @ Tue Nov 6 13:07:31 SEAST
>>>>> 2012
>>>>>
>>>>> (/home/Jelita/moses/corpora_trained/lc_clean/MinLen-1.MaxLen-60/CleanAllCorpus15Oct2012.for_train.lowercase.id
>>>>> <http://CleanAllCorpus15Oct2012.for_train.lowercase.id>,/home/Jelita/moses/corpora_trained/lc_clean/MinLen-1.MaxLen-60/CleanAllCorpus15Oct2012.for_train.lowercase.en,/home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/lex)
>>>>>
>>>>> !Use of uninitialized value $a in scalar chomp at
>>>>> /home/Jelita/moses/tools/moses/scripts/training/train-model.perl line
>>>>> 1079.
>>>>> Use of uninitialized value $a in split at
>>>>> /home/Jelita/moses/tools/moses/scripts/training/train-model.perl line
>>>>> 1082.
>>>>>
>>>>> What is the cause? I use cygwin for Windows 7 on a 64-bit machine.
>>>>> I ran a few times and it can't get pass the Model 1 training.
>>>>>
>>>>> Thanks.
>>>>>
>>>>> Best regards,
>>>>>
>>>>> Jelita
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Moses-support mailing list
>>>>> [email protected]
>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>
>>>>
>>>>
>>>> --
>>>> The University of Edinburgh is a charitable body, registered in
>>>> Scotland, with registration number SC005336.
>>>>
>>>
>>
>>
>> _______________________________________________
>> Moses-support mailing list
>> [email protected]
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support