Hi Barry,

Actually how do we solve the more than 2 GB memory problem? Thanks.

Best regards,

Jelita

On Fri, Nov 9, 2012 at 10:51 AM, Jelita Asian
<[email protected]>wrote:

> Hi Barry,
>
> Thanks. I will look into it now.
>
> Cheers,
>
> Jelita
>
>
> On Thu, Nov 8, 2012 at 10:09 PM, Barry Haddow 
> <[email protected]>wrote:
>
>> Hi Jelita
>>
>> It could be running out of memory. Under cygwin, mgiza will be limited to
>> 2GB
>> http://www.statmt.org/moses/?**n=Moses.FAQ#ntoc9<http://www.statmt.org/moses/?n=Moses.FAQ#ntoc9>
>>
>> cheers - Barry
>>
>>
>> On 06/11/12 07:33, Jelita Asian wrote:
>>
>>> Hi,
>>>
>>> I run Moses training using moses-for-mere-mortal scripts. The run is
>>> used to be OK. However, since I increase the number of words (mostly
>>> numbers written in words where words act as parallel sentences in corpus
>>> for Indonesian and English), I keep getting mgiza stack-dump, hence the
>>> training is failed.
>>>
>>> Here is the extract for the log file of the run:
>>>
>>> -----------
>>> Model1: Iteration 5
>>> Reading more sentence pairs into memory ...
>>> [sent:100000]
>>> Reading more sentence pairs into memory ...
>>> Reading more sentence pairs into memory ...
>>> Reading more sentence pairs into memory ...
>>> Model1: (5) TRAIN CROSS-ENTROPY 5.82453 PERPLEXITY 56.6706
>>> Model1: (5) VITERBI TRAIN CROSS-ENTROPY 6.59753 PERPLEXITY 96.8401
>>> Model 1 Iteration: 5 took: 87 seconds
>>> Entire Model1 Training took: 444 seconds
>>> NOTE: I am doing iterations with the HMM model!
>>> Read classes: #words: 48562  #classes: 51
>>> Actual number of read words: 48561 stored words: 48561
>>> Read classes: #words: 45484  #classes: 51
>>> Actual number of read words: 45483 stored words: 45483
>>>
>>> ==============================**============================
>>> Hmm Training Started at: Tue Nov  6 12:46:41 2012
>>>
>>> ./train-AllCorpusIndo.sh: line 1184:  3936 Aborted                 (core
>>> dumped) $toolsdir/mgiza/bin/mgiza -ncpus $mgizanumprocessors -c
>>> $modeldir/$lang2-$lang1-int-**train.snt -o $modeldir/$lang2-$lang1 -s
>>> $modeldir/$lang1.vcb -t $modeldir/$lang2.vcb -coocurrencefile
>>> $modeldir/$lang1-$lang2.cooc -ml $ml -countincreasecutoff
>>> $countincreasecutoff -countincreasecutoffal $countincreasecutoffal
>>> -mincountincrease $mincountincrease -peggedcutoff $peggedcutoff -probcutoff
>>> $probcutoff -probsmooth $probsmooth -m1 $model1iterations -m2
>>> $model2iterations -mh $hmmiterations -m3 $model3iterations -m4
>>> $model4iterations -m5 $model5iterations -m6 $model6iterations -t1
>>> $model1dumpfrequency -t2 $model2dumpfrequency -t2to3 $transferdumpfrequency
>>> -t345 $model345dumpfrequency -th $hmmdumpfrequency -onlyaldumps
>>> $onlyaldumps -nodumps $nodumps -compactadtable $compactadtable
>>> -model4smoothfactor $model4smoothfactor -compactalignmentformat
>>> $compactalignmentformat -verbose $verbose -verbosesentence $verbosesentence
>>> -emalsmooth $emalsmooth -model23smoothfactor $model23smoothfactor
>>> -model4smoothfactor $model4smoothfactor -model5smoothfactor
>>> $model5smoothfactor -nsmooth $nsmooth -nsmoothgeneral $nsmoothgeneral -*
>>> *deficientdistortionforemptywor**d $**deficientdistortionforemptywor**d
>>> -depm4 $depm4 -depm5 $depm5 -emalignmentdependencies
>>> $emalignmentdependencies -emprobforempty $emprobforempty -m5p0 $m5p0
>>> -manlexfactor1 $manlexfactor1 -manlexfactor2 $manlexfactor2
>>> -manlexmaxmultiplicity $manlexmaxmultiplicity -maxfertility $maxfertility
>>> -p0 $p0 -pegging $pegging
>>> Starting MGIZA
>>> Initializing Global Paras
>>> DEBUG: EnterDEBUG: PrefixDEBUG: LogParsing Arguments
>>> Parameter 'ncpus' changed from '2' to '8'
>>> Parameter 'c' changed from '' to '/home/Jelita/moses/corpora_**
>>> trained/model/id-en-**CleanAllCorpus15Oct2012.for_**train.LM-**
>>> CleanAllCorpus15Oct2012.for_**train-IRSTLM-4-1-improved-**
>>> kneser-ney-0-1/T-1-1-9-MKCLS-**2-50-MGIZA-8-GIZA-101-5-0-5-3-**
>>> 3-0-0-1e-06-1e-05-1e-07-0.03-**1e-07-1e-07-0-0-0-0-0-0-0-1-1-**
>>> 0--10-0.2-0-0.4-0.1-4-0-1-0-**76-68-2-0.4--1-0-0-20-10-0.**
>>> 999-0-MOSES-6-1-1-60-7-4-1-1-**1-0-0-200-1.0-0-20-0-0-0-1000-**
>>> 100-20-0-6/id-en-int-train.**snt'
>>> Parameter 'o' changed from '112-11-06.124815.Jelita' to
>>> '/home/Jelita/moses/corpora_**trained/model/id-en-**
>>> CleanAllCorpus15Oct2012.for_**train.LM-**CleanAllCorpus15Oct2012.for_**
>>> train-IRSTLM-4-1-improved-**kneser-ney-0-1/T-1-1-9-MKCLS-**
>>> 2-50-MGIZA-8-GIZA-101-5-0-5-3-**3-0-0-1e-06-1e-05-1e-07-0.03-**
>>> 1e-07-1e-07-0-0-0-0-0-0-0-1-1-**0--10-0.2-0-0.4-0.1-4-0-1-0-**
>>> 76-68-2-0.4--1-0-0-20-10-0.**999-0-MOSES-6-1-1-60-7-4-1-1-**
>>> 1-0-0-200-1.0-0-20-0-0-0-1000-**100-20-0-6/id-en'
>>> Parameter 's' changed from '' to '/home/Jelita/moses/corpora_**
>>> trained/model/id-en-**CleanAllCorpus15Oct2012.for_**train.LM-**
>>> CleanAllCorpus15Oct2012.for_**train-IRSTLM-4-1-improved-**
>>> kneser-ney-0-1/T-1-1-9-MKCLS-**2-50-MGIZA-8-GIZA-101-5-0-5-3-**
>>> 3-0-0-1e-06-1e-05-1e-07-0.03-**1e-07-1e-07-0-0-0-0-0-0-0-1-1-**
>>> 0--10-0.2-0-0.4-0.1-4-0-1-0-**76-68-2-0.4--1-0-0-20-10-0.**
>>> 999-0-MOSES-6-1-1-60-7-4-1-1-**1-0-0-200-1.0-0-20-0-0-0-1000-**
>>> 100-20-0-6/en.vcb'
>>> Parameter 't' changed from '' to '/home/Jelita/moses/corpora_**
>>> trained/model/id-en-**CleanAllCorpus15Oct2012.for_**train.LM-**
>>> CleanAllCorpus15Oct2012.for_**train-IRSTLM-4-1-improved-**
>>> kneser-ney-0-1/T-1-1-9-MKCLS-**2-50-MGIZA-8-GIZA-101-5-0-5-3-**
>>> 3-0-0-1e-06-1e-05-1e-07-0.03-**1e-07-1e-07-0-0-0-0-0-0-0-1-1-**
>>> 0--10-0.2-0-0.4-0.1-4-0-1-0-**76-68-2-0.4--1-0-0-20-10-0.**
>>> 999-0-MOSES-6-1-1-60-7-4-1-1-**1-0-0-200-1.0-0-20-0-0-0-1000-**
>>> 100-20-0-6/id.vcb'
>>> Parameter 'coocurrencefile' changed from '' to
>>> '/home/Jelita/moses/corpora_**trained/model/id-en-**
>>> CleanAllCorpus15Oct2012.for_**train.LM-**CleanAllCorpus15Oct2012.for_**
>>> train-IRSTLM-4-1-improved-**kneser-ney-0-1/T-1-1-9-MKCLS-**
>>> 2-50-MGIZA-8-GIZA-101-5-0-5-3-**3-0-0-1e-06-1e-05-1e-07-0.03-**
>>> 1e-07-1e-07-0-0-0-0-0-0-0-1-1-**0--10-0.2-0-0.4-0.1-4-0-1-0-**
>>> 76-68-2-0.4--1-0-0-20-10-0.**999-0-MOSES-6-1-1-60-7-4-1-1-**
>>> 1-0-0-200-1.0-0-20-0-0-0-1000-**100-20-0-6/en-id.cooc'
>>> Parameter 'm3' changed from '5' to '3'
>>> Parameter 'm4' changed from '5' to '3'
>>> Parameter 'onlyaldumps' changed from '0' to '1'
>>> Parameter 'nodumps' changed from '0' to '1'
>>> Parameter 'model4smoothfactor' changed from '0.2' to '0.4'
>>> Parameter 'nsmooth' changed from '64' to '4'
>>> Parameter 'p0' changed from '-1' to '0.999'
>>> general parameters:
>>> -------------------
>>> ml = 101  (maximum sentence length)
>>>
>>> Here is another extract
>>>
>>> ./train-AllCorpusIndo.sh: line 1184:  2756 Aborted                 (core
>>> dumped) $toolsdir/mgiza/bin/mgiza -ncpus $mgizanumprocessors -c
>>> $modeldir/$lang1-$lang2-int-**train.snt -o $modeldir/$lang1-$lang2 -s
>>> $modeldir/$lang2.vcb -t $modeldir/$lang1.vcb -coocurrencefile
>>> $modeldir/$lang2-$lang1.cooc -ml $ml -countincreasecutoff
>>> $countincreasecutoff -countincreasecutoffal $countincreasecutoffal
>>> -mincountincrease $mincountincrease -peggedcutoff $peggedcutoff -probcutoff
>>> $probcutoff -probsmooth $probsmooth -m1 $model1iterations -m2
>>> $model2iterations -mh $hmmiterations -m3 $model3iterations -m4
>>> $model4iterations -m5 $model5iterations -m6 $model6iterations -t1
>>> $model1dumpfrequency -t2 $model2dumpfrequency -t2to3 $transferdumpfrequency
>>> -t345 $model345dumpfrequency -th $hmmdumpfrequency -onlyaldumps
>>> $onlyaldumps -nodumps $nodumps -compactadtable $compactadtable
>>> -model4smoothfactor $model4smoothfactor -compactalignmentformat
>>> $compactalignmentformat -verbose $verbose -verbosesentence $verbosesentence
>>> -emalsmooth $emalsmooth -model23smoothfactor $model23smoothfactor
>>> -model4smoothfactor $model4smoothfactor -model5smoothfactor
>>> $model5smoothfactor -nsmooth $nsmooth -nsmoothgeneral $nsmoothgeneral -*
>>> *deficientdistortionforemptywor**d $**deficientdistortionforemptywor**d
>>> -depm4 $depm4 -depm5 $depm5 -emalignmentdependencies
>>> $emalignmentdependencies -emprobforempty $emprobforempty -m5p0 $m5p0
>>> -manlexfactor1 $manlexfactor1 -manlexfactor2 $manlexfactor2
>>> -manlexmaxmultiplicity $manlexmaxmultiplicity -maxfertility $maxfertility
>>> -p0 $p0 -pegging $pegging
>>> ****** phase 2.1 of training (merge alignments)
>>> Traceback (most recent call last):
>>>   File "/home/Jelita/moses/tools/**mgiza/scripts/merge_alignment.**py",
>>> line 24, in <module>
>>>     files.append(open(sys.argv[i],**"r"));
>>> IOError: [Errno 2] No such file or directory:
>>> '/home/Jelita/moses/corpora_**trained/model/id-en-**
>>> CleanAllCorpus15Oct2012.for_**train.LM-**CleanAllCorpus15Oct2012.for_**
>>> train-IRSTLM-4-1-improved-**kneser-ney-0-1/T-1-1-9-MKCLS-**
>>> 2-50-MGIZA-8-GIZA-101-5-0-5-3-**3-0-0-1e-06-1e-05-1e-07-0.03-**
>>> 1e-07-1e-07-0-0-0-0-0-0-0-1-1-**0--10-0.2-0-0.4-0.1-4-0-1-0-**
>>> 76-68-2-0.4--1-0-0-20-10-0.**999-0-MOSES-6-1-1-60-7-4-1-1-**
>>> 1-0-0-200-1.0-0-20-0-0-0-1000-**100-20-0-6/id-en.A3.final.**part*'
>>> Traceback (most recent call last):
>>>   File "/home/Jelita/moses/tools/**mgiza/scripts/merge_alignment.**py",
>>> line 24, in <module>
>>>     files.append(open(sys.argv[i],**"r"));
>>> IOError: [Errno 2] No such file or directory:
>>> '/home/Jelita/moses/corpora_**trained/model/id-en-**
>>> CleanAllCorpus15Oct2012.for_**train.LM-**CleanAllCorpus15Oct2012.for_**
>>> train-IRSTLM-4-1-improved-**kneser-ney-0-1/T-1-1-9-MKCLS-**
>>> 2-50-MGIZA-8-GIZA-101-5-0-5-3-**3-0-0-1e-06-1e-05-1e-07-0.03-**
>>> 1e-07-1e-07-0-0-0-0-0-0-0-1-1-**0--10-0.2-0-0.4-0.1-4-0-1-0-**
>>> 76-68-2-0.4--1-0-0-20-10-0.**999-0-MOSES-6-1-1-60-7-4-1-1-**
>>> 1-0-0-200-1.0-0-20-0-0-0-1000-**100-20-0-6/en-id.A3.final.**part*'
>>> ****** Rest of parallel training
>>> Using SCRIPTS_ROOTDIR: /home/Jelita/moses/tools/**moses/scripts
>>> Using single-thread GIZA
>>> (3) generate word alignment @ Tue Nov  6 13:07:31 SEAST 2012
>>> Combining forward and inverted alignment from files:
>>>   /home/Jelita/moses/corpora_**trained/model/id-en-**
>>> CleanAllCorpus15Oct2012.for_**train.LM-**CleanAllCorpus15Oct2012.for_**
>>> train-IRSTLM-4-1-improved-**kneser-ney-0-1/T-1-1-9-MKCLS-**
>>> 2-50-MGIZA-8-GIZA-101-5-0-5-3-**3-0-0-1e-06-1e-05-1e-07-0.03-**
>>> 1e-07-1e-07-0-0-0-0-0-0-0-1-1-**0--10-0.2-0-0.4-0.1-4-0-1-0-**
>>> 76-68-2-0.4--1-0-0-20-10-0.**999-0-MOSES-6-1-1-60-7-4-1-1-**
>>> 1-0-0-200-1.0-0-20-0-0-0-1000-**100-20-0-6/id-en.A3.final.{**bz2,gz}
>>>   /home/Jelita/moses/corpora_**trained/model/id-en-**
>>> CleanAllCorpus15Oct2012.for_**train.LM-**CleanAllCorpus15Oct2012.for_**
>>> train-IRSTLM-4-1-improved-**kneser-ney-0-1/T-1-1-9-MKCLS-**
>>> 2-50-MGIZA-8-GIZA-101-5-0-5-3-**3-0-0-1e-06-1e-05-1e-07-0.03-**
>>> 1e-07-1e-07-0-0-0-0-0-0-0-1-1-**0--10-0.2-0-0.4-0.1-4-0-1-0-**
>>> 76-68-2-0.4--1-0-0-20-10-0.**999-0-MOSES-6-1-1-60-7-4-1-1-**
>>> 1-0-0-200-1.0-0-20-0-0-0-1000-**100-20-0-6/en-id.A3.final.{**bz2,gz}
>>> Executing: mkdir -p /home/Jelita/moses/corpora_**trained/model/id-en-**
>>> CleanAllCorpus15Oct2012.for_**train.LM-**CleanAllCorpus15Oct2012.for_**
>>> train-IRSTLM-4-1-improved-**kneser-ney-0-1/T-1-1-9-MKCLS-**
>>> 2-50-MGIZA-8-GIZA-101-5-0-5-3-**3-0-0-1e-06-1e-05-1e-07-0.03-**
>>> 1e-07-1e-07-0-0-0-0-0-0-0-1-1-**0--10-0.2-0-0.4-0.1-4-0-1-0-**
>>> 76-68-2-0.4--1-0-0-20-10-0.**999-0-MOSES-6-1-1-60-7-4-1-1-**
>>> 1-0-0-200-1.0-0-20-0-0-0-1000-**100-20-0-6
>>>  Executing: /home/Jelita/moses/tools/**moses/scripts/training/symal/g**
>>> iza2bal.pl <http://giza2bal.pl> <http://giza2bal.pl> -d "gzip -cd
>>> /home/Jelita/moses/corpora_**trained/model/id-en-**
>>> CleanAllCorpus15Oct2012.for_**train.LM-**CleanAllCorpus15Oct2012.for_**
>>> train-IRSTLM-4-1-improved-**kneser-ney-0-1/T-1-1-9-MKCLS-**
>>> 2-50-MGIZA-8-GIZA-101-5-0-5-3-**3-0-0-1e-06-1e-05-1e-07-0.03-**
>>> 1e-07-1e-07-0-0-0-0-0-0-0-1-1-**0--10-0.2-0-0.4-0.1-4-0-1-0-**
>>> 76-68-2-0.4--1-0-0-20-10-0.**999-0-MOSES-6-1-1-60-7-4-1-1-**
>>> 1-0-0-200-1.0-0-20-0-0-0-1000-**100-20-0-6/en-id.A3.final.gz" -i "gzip
>>> -cd /home/Jelita/moses/corpora_**trained/model/id-en-**
>>> CleanAllCorpus15Oct2012.for_**train.LM-**CleanAllCorpus15Oct2012.for_**
>>> train-IRSTLM-4-1-improved-**kneser-ney-0-1/T-1-1-9-MKCLS-**
>>> 2-50-MGIZA-8-GIZA-101-5-0-5-3-**3-0-0-1e-06-1e-05-1e-07-0.03-**
>>> 1e-07-1e-07-0-0-0-0-0-0-0-1-1-**0--10-0.2-0-0.4-0.1-4-0-1-0-**
>>> 76-68-2-0.4--1-0-0-20-10-0.**999-0-MOSES-6-1-1-60-7-4-1-1-**
>>> 1-0-0-200-1.0-0-20-0-0-0-1000-**100-20-0-6/id-en.A3.final.gz"
>>> |/home/Jelita/moses/tools/**moses/scripts/training/symal/**symal
>>> -alignment="grow" -diagonal="yes" -final="yes" -both="yes" >
>>> /home/Jelita/moses/corpora_**trained/model/id-en-**
>>> CleanAllCorpus15Oct2012.for_**train.LM-**CleanAllCorpus15Oct2012.for_**
>>> train-IRSTLM-4-1-improved-**kneser-ney-0-1/T-1-1-9-MKCLS-**
>>> 2-50-MGIZA-8-GIZA-101-5-0-5-3-**3-0-0-1e-06-1e-05-1e-07-0.03-**
>>> 1e-07-1e-07-0-0-0-0-0-0-0-1-1-**0--10-0.2-0-0.4-0.1-4-0-1-0-**
>>> 76-68-2-0.4--1-0-0-20-10-0.**999-0-MOSES-6-1-1-60-7-4-1-1-**
>>> 1-0-0-200-1.0-0-20-0-0-0-1000-**100-20-0-6/aligned.grow-diag-**final-and
>>>
>>> symal: computing grow alignment: diagonal (1) final (1)both-uncovered (1)
>>> skip=<0> counts=<0>
>>> (4) generate lexical translation table 0-0 @ Tue Nov  6 13:07:31 SEAST
>>> 2012
>>> (/home/Jelita/moses/corpora_**trained/lc_clean/MinLen-1.**MaxLen-60/
>>> CleanAllCorpus15Oct2**012.for_train.lowercase.id<http://CleanAllCorpus15Oct2012.for_train.lowercase.id><
>>> http://**CleanAllCorpus15Oct2012.for_**train.lowercase.id<http://CleanAllCorpus15Oct2012.for_train.lowercase.id>
>>> >,/home/**Jelita/moses/corpora_trained/**lc_clean/MinLen-1.MaxLen-60/**
>>> CleanAllCorpus15Oct2012.for_**train.lowercase.en,/home/**
>>> Jelita/moses/corpora_trained/**model/id-en-**
>>> CleanAllCorpus15Oct2012.for_**train.LM-**CleanAllCorpus15Oct2012.for_**
>>> train-IRSTLM-4-1-improved-**kneser-ney-0-1/T-1-1-9-MKCLS-**
>>> 2-50-MGIZA-8-GIZA-101-5-0-5-3-**3-0-0-1e-06-1e-05-1e-07-0.03-**
>>> 1e-07-1e-07-0-0-0-0-0-0-0-1-1-**0--10-0.2-0-0.4-0.1-4-0-1-0-**
>>> 76-68-2-0.4--1-0-0-20-10-0.**999-0-MOSES-6-1-1-60-7-4-1-1-**
>>> 1-0-0-200-1.0-0-20-0-0-0-1000-**100-20-0-6/lex)
>>>
>>> !Use of uninitialized value $a in scalar chomp at
>>> /home/Jelita/moses/tools/**moses/scripts/training/train-**model.perl
>>> line 1079.
>>> Use of uninitialized value $a in split at /home/Jelita/moses/tools/**
>>> moses/scripts/training/train-**model.perl line 1082.
>>>
>>> What is the cause? I use cygwin for Windows 7 on a 64-bit machine.
>>> I ran a few times and it can't get pass the Model 1 training.
>>>
>>> Thanks.
>>>
>>> Best regards,
>>>
>>> Jelita
>>>
>>>
>>>
>>> ______________________________**_________________
>>> Moses-support mailing list
>>> [email protected]
>>> http://mailman.mit.edu/**mailman/listinfo/moses-support<http://mailman.mit.edu/mailman/listinfo/moses-support>
>>>
>>
>>
>> --
>> The University of Edinburgh is a charitable body, registered in
>> Scotland, with registration number SC005336.
>>
>>
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to