Hi Barry,

Thanks. I will look into it now.

Cheers,

Jelita

On Thu, Nov 8, 2012 at 10:09 PM, Barry Haddow <[email protected]>wrote:

> Hi Jelita
>
> It could be running out of memory. Under cygwin, mgiza will be limited to
> 2GB
> http://www.statmt.org/moses/?**n=Moses.FAQ#ntoc9<http://www.statmt.org/moses/?n=Moses.FAQ#ntoc9>
>
> cheers - Barry
>
>
> On 06/11/12 07:33, Jelita Asian wrote:
>
>> Hi,
>>
>> I run Moses training using moses-for-mere-mortal scripts. The run is used
>> to be OK. However, since I increase the number of words (mostly numbers
>> written in words where words act as parallel sentences in corpus for
>> Indonesian and English), I keep getting mgiza stack-dump, hence the
>> training is failed.
>>
>> Here is the extract for the log file of the run:
>>
>> -----------
>> Model1: Iteration 5
>> Reading more sentence pairs into memory ...
>> [sent:100000]
>> Reading more sentence pairs into memory ...
>> Reading more sentence pairs into memory ...
>> Reading more sentence pairs into memory ...
>> Model1: (5) TRAIN CROSS-ENTROPY 5.82453 PERPLEXITY 56.6706
>> Model1: (5) VITERBI TRAIN CROSS-ENTROPY 6.59753 PERPLEXITY 96.8401
>> Model 1 Iteration: 5 took: 87 seconds
>> Entire Model1 Training took: 444 seconds
>> NOTE: I am doing iterations with the HMM model!
>> Read classes: #words: 48562  #classes: 51
>> Actual number of read words: 48561 stored words: 48561
>> Read classes: #words: 45484  #classes: 51
>> Actual number of read words: 45483 stored words: 45483
>>
>> ==============================**============================
>> Hmm Training Started at: Tue Nov  6 12:46:41 2012
>>
>> ./train-AllCorpusIndo.sh: line 1184:  3936 Aborted                 (core
>> dumped) $toolsdir/mgiza/bin/mgiza -ncpus $mgizanumprocessors -c
>> $modeldir/$lang2-$lang1-int-**train.snt -o $modeldir/$lang2-$lang1 -s
>> $modeldir/$lang1.vcb -t $modeldir/$lang2.vcb -coocurrencefile
>> $modeldir/$lang1-$lang2.cooc -ml $ml -countincreasecutoff
>> $countincreasecutoff -countincreasecutoffal $countincreasecutoffal
>> -mincountincrease $mincountincrease -peggedcutoff $peggedcutoff -probcutoff
>> $probcutoff -probsmooth $probsmooth -m1 $model1iterations -m2
>> $model2iterations -mh $hmmiterations -m3 $model3iterations -m4
>> $model4iterations -m5 $model5iterations -m6 $model6iterations -t1
>> $model1dumpfrequency -t2 $model2dumpfrequency -t2to3 $transferdumpfrequency
>> -t345 $model345dumpfrequency -th $hmmdumpfrequency -onlyaldumps
>> $onlyaldumps -nodumps $nodumps -compactadtable $compactadtable
>> -model4smoothfactor $model4smoothfactor -compactalignmentformat
>> $compactalignmentformat -verbose $verbose -verbosesentence $verbosesentence
>> -emalsmooth $emalsmooth -model23smoothfactor $model23smoothfactor
>> -model4smoothfactor $model4smoothfactor -model5smoothfactor
>> $model5smoothfactor -nsmooth $nsmooth -nsmoothgeneral $nsmoothgeneral -**
>> deficientdistortionforemptywor**d $**deficientdistortionforemptywor**d
>> -depm4 $depm4 -depm5 $depm5 -emalignmentdependencies
>> $emalignmentdependencies -emprobforempty $emprobforempty -m5p0 $m5p0
>> -manlexfactor1 $manlexfactor1 -manlexfactor2 $manlexfactor2
>> -manlexmaxmultiplicity $manlexmaxmultiplicity -maxfertility $maxfertility
>> -p0 $p0 -pegging $pegging
>> Starting MGIZA
>> Initializing Global Paras
>> DEBUG: EnterDEBUG: PrefixDEBUG: LogParsing Arguments
>> Parameter 'ncpus' changed from '2' to '8'
>> Parameter 'c' changed from '' to '/home/Jelita/moses/corpora_**
>> trained/model/id-en-**CleanAllCorpus15Oct2012.for_**train.LM-**
>> CleanAllCorpus15Oct2012.for_**train-IRSTLM-4-1-improved-**
>> kneser-ney-0-1/T-1-1-9-MKCLS-**2-50-MGIZA-8-GIZA-101-5-0-5-3-**
>> 3-0-0-1e-06-1e-05-1e-07-0.03-**1e-07-1e-07-0-0-0-0-0-0-0-1-1-**
>> 0--10-0.2-0-0.4-0.1-4-0-1-0-**76-68-2-0.4--1-0-0-20-10-0.**
>> 999-0-MOSES-6-1-1-60-7-4-1-1-**1-0-0-200-1.0-0-20-0-0-0-1000-**
>> 100-20-0-6/id-en-int-train.**snt'
>> Parameter 'o' changed from '112-11-06.124815.Jelita' to
>> '/home/Jelita/moses/corpora_**trained/model/id-en-**
>> CleanAllCorpus15Oct2012.for_**train.LM-**CleanAllCorpus15Oct2012.for_**
>> train-IRSTLM-4-1-improved-**kneser-ney-0-1/T-1-1-9-MKCLS-**
>> 2-50-MGIZA-8-GIZA-101-5-0-5-3-**3-0-0-1e-06-1e-05-1e-07-0.03-**
>> 1e-07-1e-07-0-0-0-0-0-0-0-1-1-**0--10-0.2-0-0.4-0.1-4-0-1-0-**
>> 76-68-2-0.4--1-0-0-20-10-0.**999-0-MOSES-6-1-1-60-7-4-1-1-**
>> 1-0-0-200-1.0-0-20-0-0-0-1000-**100-20-0-6/id-en'
>> Parameter 's' changed from '' to '/home/Jelita/moses/corpora_**
>> trained/model/id-en-**CleanAllCorpus15Oct2012.for_**train.LM-**
>> CleanAllCorpus15Oct2012.for_**train-IRSTLM-4-1-improved-**
>> kneser-ney-0-1/T-1-1-9-MKCLS-**2-50-MGIZA-8-GIZA-101-5-0-5-3-**
>> 3-0-0-1e-06-1e-05-1e-07-0.03-**1e-07-1e-07-0-0-0-0-0-0-0-1-1-**
>> 0--10-0.2-0-0.4-0.1-4-0-1-0-**76-68-2-0.4--1-0-0-20-10-0.**
>> 999-0-MOSES-6-1-1-60-7-4-1-1-**1-0-0-200-1.0-0-20-0-0-0-1000-**
>> 100-20-0-6/en.vcb'
>> Parameter 't' changed from '' to '/home/Jelita/moses/corpora_**
>> trained/model/id-en-**CleanAllCorpus15Oct2012.for_**train.LM-**
>> CleanAllCorpus15Oct2012.for_**train-IRSTLM-4-1-improved-**
>> kneser-ney-0-1/T-1-1-9-MKCLS-**2-50-MGIZA-8-GIZA-101-5-0-5-3-**
>> 3-0-0-1e-06-1e-05-1e-07-0.03-**1e-07-1e-07-0-0-0-0-0-0-0-1-1-**
>> 0--10-0.2-0-0.4-0.1-4-0-1-0-**76-68-2-0.4--1-0-0-20-10-0.**
>> 999-0-MOSES-6-1-1-60-7-4-1-1-**1-0-0-200-1.0-0-20-0-0-0-1000-**
>> 100-20-0-6/id.vcb'
>> Parameter 'coocurrencefile' changed from '' to
>> '/home/Jelita/moses/corpora_**trained/model/id-en-**
>> CleanAllCorpus15Oct2012.for_**train.LM-**CleanAllCorpus15Oct2012.for_**
>> train-IRSTLM-4-1-improved-**kneser-ney-0-1/T-1-1-9-MKCLS-**
>> 2-50-MGIZA-8-GIZA-101-5-0-5-3-**3-0-0-1e-06-1e-05-1e-07-0.03-**
>> 1e-07-1e-07-0-0-0-0-0-0-0-1-1-**0--10-0.2-0-0.4-0.1-4-0-1-0-**
>> 76-68-2-0.4--1-0-0-20-10-0.**999-0-MOSES-6-1-1-60-7-4-1-1-**
>> 1-0-0-200-1.0-0-20-0-0-0-1000-**100-20-0-6/en-id.cooc'
>> Parameter 'm3' changed from '5' to '3'
>> Parameter 'm4' changed from '5' to '3'
>> Parameter 'onlyaldumps' changed from '0' to '1'
>> Parameter 'nodumps' changed from '0' to '1'
>> Parameter 'model4smoothfactor' changed from '0.2' to '0.4'
>> Parameter 'nsmooth' changed from '64' to '4'
>> Parameter 'p0' changed from '-1' to '0.999'
>> general parameters:
>> -------------------
>> ml = 101  (maximum sentence length)
>>
>> Here is another extract
>>
>> ./train-AllCorpusIndo.sh: line 1184:  2756 Aborted                 (core
>> dumped) $toolsdir/mgiza/bin/mgiza -ncpus $mgizanumprocessors -c
>> $modeldir/$lang1-$lang2-int-**train.snt -o $modeldir/$lang1-$lang2 -s
>> $modeldir/$lang2.vcb -t $modeldir/$lang1.vcb -coocurrencefile
>> $modeldir/$lang2-$lang1.cooc -ml $ml -countincreasecutoff
>> $countincreasecutoff -countincreasecutoffal $countincreasecutoffal
>> -mincountincrease $mincountincrease -peggedcutoff $peggedcutoff -probcutoff
>> $probcutoff -probsmooth $probsmooth -m1 $model1iterations -m2
>> $model2iterations -mh $hmmiterations -m3 $model3iterations -m4
>> $model4iterations -m5 $model5iterations -m6 $model6iterations -t1
>> $model1dumpfrequency -t2 $model2dumpfrequency -t2to3 $transferdumpfrequency
>> -t345 $model345dumpfrequency -th $hmmdumpfrequency -onlyaldumps
>> $onlyaldumps -nodumps $nodumps -compactadtable $compactadtable
>> -model4smoothfactor $model4smoothfactor -compactalignmentformat
>> $compactalignmentformat -verbose $verbose -verbosesentence $verbosesentence
>> -emalsmooth $emalsmooth -model23smoothfactor $model23smoothfactor
>> -model4smoothfactor $model4smoothfactor -model5smoothfactor
>> $model5smoothfactor -nsmooth $nsmooth -nsmoothgeneral $nsmoothgeneral -**
>> deficientdistortionforemptywor**d $**deficientdistortionforemptywor**d
>> -depm4 $depm4 -depm5 $depm5 -emalignmentdependencies
>> $emalignmentdependencies -emprobforempty $emprobforempty -m5p0 $m5p0
>> -manlexfactor1 $manlexfactor1 -manlexfactor2 $manlexfactor2
>> -manlexmaxmultiplicity $manlexmaxmultiplicity -maxfertility $maxfertility
>> -p0 $p0 -pegging $pegging
>> ****** phase 2.1 of training (merge alignments)
>> Traceback (most recent call last):
>>   File "/home/Jelita/moses/tools/**mgiza/scripts/merge_alignment.**py",
>> line 24, in <module>
>>     files.append(open(sys.argv[i],**"r"));
>> IOError: [Errno 2] No such file or directory: '/home/Jelita/moses/corpora_
>> **trained/model/id-en-**CleanAllCorpus15Oct2012.for_**train.LM-**
>> CleanAllCorpus15Oct2012.for_**train-IRSTLM-4-1-improved-**
>> kneser-ney-0-1/T-1-1-9-MKCLS-**2-50-MGIZA-8-GIZA-101-5-0-5-3-**
>> 3-0-0-1e-06-1e-05-1e-07-0.03-**1e-07-1e-07-0-0-0-0-0-0-0-1-1-**
>> 0--10-0.2-0-0.4-0.1-4-0-1-0-**76-68-2-0.4--1-0-0-20-10-0.**
>> 999-0-MOSES-6-1-1-60-7-4-1-1-**1-0-0-200-1.0-0-20-0-0-0-1000-**
>> 100-20-0-6/id-en.A3.final.**part*'
>> Traceback (most recent call last):
>>   File "/home/Jelita/moses/tools/**mgiza/scripts/merge_alignment.**py",
>> line 24, in <module>
>>     files.append(open(sys.argv[i],**"r"));
>> IOError: [Errno 2] No such file or directory: '/home/Jelita/moses/corpora_
>> **trained/model/id-en-**CleanAllCorpus15Oct2012.for_**train.LM-**
>> CleanAllCorpus15Oct2012.for_**train-IRSTLM-4-1-improved-**
>> kneser-ney-0-1/T-1-1-9-MKCLS-**2-50-MGIZA-8-GIZA-101-5-0-5-3-**
>> 3-0-0-1e-06-1e-05-1e-07-0.03-**1e-07-1e-07-0-0-0-0-0-0-0-1-1-**
>> 0--10-0.2-0-0.4-0.1-4-0-1-0-**76-68-2-0.4--1-0-0-20-10-0.**
>> 999-0-MOSES-6-1-1-60-7-4-1-1-**1-0-0-200-1.0-0-20-0-0-0-1000-**
>> 100-20-0-6/en-id.A3.final.**part*'
>> ****** Rest of parallel training
>> Using SCRIPTS_ROOTDIR: /home/Jelita/moses/tools/**moses/scripts
>> Using single-thread GIZA
>> (3) generate word alignment @ Tue Nov  6 13:07:31 SEAST 2012
>> Combining forward and inverted alignment from files:
>>   /home/Jelita/moses/corpora_**trained/model/id-en-**
>> CleanAllCorpus15Oct2012.for_**train.LM-**CleanAllCorpus15Oct2012.for_**
>> train-IRSTLM-4-1-improved-**kneser-ney-0-1/T-1-1-9-MKCLS-**
>> 2-50-MGIZA-8-GIZA-101-5-0-5-3-**3-0-0-1e-06-1e-05-1e-07-0.03-**
>> 1e-07-1e-07-0-0-0-0-0-0-0-1-1-**0--10-0.2-0-0.4-0.1-4-0-1-0-**
>> 76-68-2-0.4--1-0-0-20-10-0.**999-0-MOSES-6-1-1-60-7-4-1-1-**
>> 1-0-0-200-1.0-0-20-0-0-0-1000-**100-20-0-6/id-en.A3.final.{**bz2,gz}
>>   /home/Jelita/moses/corpora_**trained/model/id-en-**
>> CleanAllCorpus15Oct2012.for_**train.LM-**CleanAllCorpus15Oct2012.for_**
>> train-IRSTLM-4-1-improved-**kneser-ney-0-1/T-1-1-9-MKCLS-**
>> 2-50-MGIZA-8-GIZA-101-5-0-5-3-**3-0-0-1e-06-1e-05-1e-07-0.03-**
>> 1e-07-1e-07-0-0-0-0-0-0-0-1-1-**0--10-0.2-0-0.4-0.1-4-0-1-0-**
>> 76-68-2-0.4--1-0-0-20-10-0.**999-0-MOSES-6-1-1-60-7-4-1-1-**
>> 1-0-0-200-1.0-0-20-0-0-0-1000-**100-20-0-6/en-id.A3.final.{**bz2,gz}
>> Executing: mkdir -p /home/Jelita/moses/corpora_**trained/model/id-en-**
>> CleanAllCorpus15Oct2012.for_**train.LM-**CleanAllCorpus15Oct2012.for_**
>> train-IRSTLM-4-1-improved-**kneser-ney-0-1/T-1-1-9-MKCLS-**
>> 2-50-MGIZA-8-GIZA-101-5-0-5-3-**3-0-0-1e-06-1e-05-1e-07-0.03-**
>> 1e-07-1e-07-0-0-0-0-0-0-0-1-1-**0--10-0.2-0-0.4-0.1-4-0-1-0-**
>> 76-68-2-0.4--1-0-0-20-10-0.**999-0-MOSES-6-1-1-60-7-4-1-1-**
>> 1-0-0-200-1.0-0-20-0-0-0-1000-**100-20-0-6
>> Executing: /home/Jelita/moses/tools/**moses/scripts/training/symal/g**
>> iza2bal.pl <http://giza2bal.pl> <http://giza2bal.pl> -d "gzip -cd
>> /home/Jelita/moses/corpora_**trained/model/id-en-**
>> CleanAllCorpus15Oct2012.for_**train.LM-**CleanAllCorpus15Oct2012.for_**
>> train-IRSTLM-4-1-improved-**kneser-ney-0-1/T-1-1-9-MKCLS-**
>> 2-50-MGIZA-8-GIZA-101-5-0-5-3-**3-0-0-1e-06-1e-05-1e-07-0.03-**
>> 1e-07-1e-07-0-0-0-0-0-0-0-1-1-**0--10-0.2-0-0.4-0.1-4-0-1-0-**
>> 76-68-2-0.4--1-0-0-20-10-0.**999-0-MOSES-6-1-1-60-7-4-1-1-**
>> 1-0-0-200-1.0-0-20-0-0-0-1000-**100-20-0-6/en-id.A3.final.gz" -i "gzip
>> -cd /home/Jelita/moses/corpora_**trained/model/id-en-**
>> CleanAllCorpus15Oct2012.for_**train.LM-**CleanAllCorpus15Oct2012.for_**
>> train-IRSTLM-4-1-improved-**kneser-ney-0-1/T-1-1-9-MKCLS-**
>> 2-50-MGIZA-8-GIZA-101-5-0-5-3-**3-0-0-1e-06-1e-05-1e-07-0.03-**
>> 1e-07-1e-07-0-0-0-0-0-0-0-1-1-**0--10-0.2-0-0.4-0.1-4-0-1-0-**
>> 76-68-2-0.4--1-0-0-20-10-0.**999-0-MOSES-6-1-1-60-7-4-1-1-**
>> 1-0-0-200-1.0-0-20-0-0-0-1000-**100-20-0-6/id-en.A3.final.gz"
>> |/home/Jelita/moses/tools/**moses/scripts/training/symal/**symal
>> -alignment="grow" -diagonal="yes" -final="yes" -both="yes" >
>> /home/Jelita/moses/corpora_**trained/model/id-en-**
>> CleanAllCorpus15Oct2012.for_**train.LM-**CleanAllCorpus15Oct2012.for_**
>> train-IRSTLM-4-1-improved-**kneser-ney-0-1/T-1-1-9-MKCLS-**
>> 2-50-MGIZA-8-GIZA-101-5-0-5-3-**3-0-0-1e-06-1e-05-1e-07-0.03-**
>> 1e-07-1e-07-0-0-0-0-0-0-0-1-1-**0--10-0.2-0-0.4-0.1-4-0-1-0-**
>> 76-68-2-0.4--1-0-0-20-10-0.**999-0-MOSES-6-1-1-60-7-4-1-1-**
>> 1-0-0-200-1.0-0-20-0-0-0-1000-**100-20-0-6/aligned.grow-diag-**final-and
>>
>> symal: computing grow alignment: diagonal (1) final (1)both-uncovered (1)
>> skip=<0> counts=<0>
>> (4) generate lexical translation table 0-0 @ Tue Nov  6 13:07:31 SEAST
>> 2012
>> (/home/Jelita/moses/corpora_**trained/lc_clean/MinLen-1.**MaxLen-60/
>> CleanAllCorpus15Oct2**012.for_train.lowercase.id<http://CleanAllCorpus15Oct2012.for_train.lowercase.id><
>> http://**CleanAllCorpus15Oct2012.for_**train.lowercase.id<http://CleanAllCorpus15Oct2012.for_train.lowercase.id>
>> >,/home/**Jelita/moses/corpora_trained/**lc_clean/MinLen-1.MaxLen-60/**
>> CleanAllCorpus15Oct2012.for_**train.lowercase.en,/home/**
>> Jelita/moses/corpora_trained/**model/id-en-**CleanAllCorpus15Oct2012.for_
>> **train.LM-**CleanAllCorpus15Oct2012.for_**train-IRSTLM-4-1-improved-**
>> kneser-ney-0-1/T-1-1-9-MKCLS-**2-50-MGIZA-8-GIZA-101-5-0-5-3-**
>> 3-0-0-1e-06-1e-05-1e-07-0.03-**1e-07-1e-07-0-0-0-0-0-0-0-1-1-**
>> 0--10-0.2-0-0.4-0.1-4-0-1-0-**76-68-2-0.4--1-0-0-20-10-0.**
>> 999-0-MOSES-6-1-1-60-7-4-1-1-**1-0-0-200-1.0-0-20-0-0-0-1000-**
>> 100-20-0-6/lex)
>>
>> !Use of uninitialized value $a in scalar chomp at
>> /home/Jelita/moses/tools/**moses/scripts/training/train-**model.perl
>> line 1079.
>> Use of uninitialized value $a in split at /home/Jelita/moses/tools/**
>> moses/scripts/training/train-**model.perl line 1082.
>>
>> What is the cause? I use cygwin for Windows 7 on a 64-bit machine.
>> I ran a few times and it can't get pass the Model 1 training.
>>
>> Thanks.
>>
>> Best regards,
>>
>> Jelita
>>
>>
>>
>> ______________________________**_________________
>> Moses-support mailing list
>> [email protected]
>> http://mailman.mit.edu/**mailman/listinfo/moses-support<http://mailman.mit.edu/mailman/listinfo/moses-support>
>>
>
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to