Hi,

one (not completely satisfying) solution is to break up
the corpus and run MGIZA++ separately on each part.

-phi

On Mon, Nov 12, 2012 at 4:56 AM, Jelita Asian
<[email protected]> wrote:
> Hi Barry,
>
> Actually how do we solve the more than 2 GB memory problem? Thanks.
>
> Best regards,
>
> Jelita
>
>
> On Fri, Nov 9, 2012 at 10:51 AM, Jelita Asian <[email protected]>
> wrote:
>>
>> Hi Barry,
>>
>> Thanks. I will look into it now.
>>
>> Cheers,
>>
>> Jelita
>>
>>
>> On Thu, Nov 8, 2012 at 10:09 PM, Barry Haddow <[email protected]>
>> wrote:
>>>
>>> Hi Jelita
>>>
>>> It could be running out of memory. Under cygwin, mgiza will be limited to
>>> 2GB
>>> http://www.statmt.org/moses/?n=Moses.FAQ#ntoc9
>>>
>>> cheers - Barry
>>>
>>>
>>> On 06/11/12 07:33, Jelita Asian wrote:
>>>>
>>>> Hi,
>>>>
>>>> I run Moses training using moses-for-mere-mortal scripts. The run is
>>>> used to be OK. However, since I increase the number of words (mostly 
>>>> numbers
>>>> written in words where words act as parallel sentences in corpus for
>>>> Indonesian and English), I keep getting mgiza stack-dump, hence the 
>>>> training
>>>> is failed.
>>>>
>>>> Here is the extract for the log file of the run:
>>>>
>>>> -----------
>>>> Model1: Iteration 5
>>>> Reading more sentence pairs into memory ...
>>>> [sent:100000]
>>>> Reading more sentence pairs into memory ...
>>>> Reading more sentence pairs into memory ...
>>>> Reading more sentence pairs into memory ...
>>>> Model1: (5) TRAIN CROSS-ENTROPY 5.82453 PERPLEXITY 56.6706
>>>> Model1: (5) VITERBI TRAIN CROSS-ENTROPY 6.59753 PERPLEXITY 96.8401
>>>> Model 1 Iteration: 5 took: 87 seconds
>>>> Entire Model1 Training took: 444 seconds
>>>> NOTE: I am doing iterations with the HMM model!
>>>> Read classes: #words: 48562  #classes: 51
>>>> Actual number of read words: 48561 stored words: 48561
>>>> Read classes: #words: 45484  #classes: 51
>>>> Actual number of read words: 45483 stored words: 45483
>>>>
>>>> ==========================================================
>>>> Hmm Training Started at: Tue Nov  6 12:46:41 2012
>>>>
>>>> ./train-AllCorpusIndo.sh: line 1184:  3936 Aborted                 (core
>>>> dumped) $toolsdir/mgiza/bin/mgiza -ncpus $mgizanumprocessors -c
>>>> $modeldir/$lang2-$lang1-int-train.snt -o $modeldir/$lang2-$lang1 -s
>>>> $modeldir/$lang1.vcb -t $modeldir/$lang2.vcb -coocurrencefile
>>>> $modeldir/$lang1-$lang2.cooc -ml $ml -countincreasecutoff
>>>> $countincreasecutoff -countincreasecutoffal $countincreasecutoffal
>>>> -mincountincrease $mincountincrease -peggedcutoff $peggedcutoff -probcutoff
>>>> $probcutoff -probsmooth $probsmooth -m1 $model1iterations -m2
>>>> $model2iterations -mh $hmmiterations -m3 $model3iterations -m4
>>>> $model4iterations -m5 $model5iterations -m6 $model6iterations -t1
>>>> $model1dumpfrequency -t2 $model2dumpfrequency -t2to3 $transferdumpfrequency
>>>> -t345 $model345dumpfrequency -th $hmmdumpfrequency -onlyaldumps 
>>>> $onlyaldumps
>>>> -nodumps $nodumps -compactadtable $compactadtable -model4smoothfactor
>>>> $model4smoothfactor -compactalignmentformat $compactalignmentformat 
>>>> -verbose
>>>> $verbose -verbosesentence $verbosesentence -emalsmooth $emalsmooth
>>>> -model23smoothfactor $model23smoothfactor -model4smoothfactor
>>>> $model4smoothfactor -model5smoothfactor $model5smoothfactor -nsmooth
>>>> $nsmooth -nsmoothgeneral $nsmoothgeneral -deficientdistortionforemptyword
>>>> $deficientdistortionforemptyword -depm4 $depm4 -depm5 $depm5
>>>> -emalignmentdependencies $emalignmentdependencies -emprobforempty
>>>> $emprobforempty -m5p0 $m5p0 -manlexfactor1 $manlexfactor1 -manlexfactor2
>>>> $manlexfactor2 -manlexmaxmultiplicity $manlexmaxmultiplicity -maxfertility
>>>> $maxfertility -p0 $p0 -pegging $pegging
>>>> Starting MGIZA
>>>> Initializing Global Paras
>>>> DEBUG: EnterDEBUG: PrefixDEBUG: LogParsing Arguments
>>>> Parameter 'ncpus' changed from '2' to '8'
>>>> Parameter 'c' changed from '' to
>>>> '/home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/id-en-int-train.snt'
>>>> Parameter 'o' changed from '112-11-06.124815.Jelita' to
>>>> '/home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/id-en'
>>>> Parameter 's' changed from '' to
>>>> '/home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/en.vcb'
>>>> Parameter 't' changed from '' to
>>>> '/home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/id.vcb'
>>>> Parameter 'coocurrencefile' changed from '' to
>>>> '/home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/en-id.cooc'
>>>> Parameter 'm3' changed from '5' to '3'
>>>> Parameter 'm4' changed from '5' to '3'
>>>> Parameter 'onlyaldumps' changed from '0' to '1'
>>>> Parameter 'nodumps' changed from '0' to '1'
>>>> Parameter 'model4smoothfactor' changed from '0.2' to '0.4'
>>>> Parameter 'nsmooth' changed from '64' to '4'
>>>> Parameter 'p0' changed from '-1' to '0.999'
>>>> general parameters:
>>>> -------------------
>>>> ml = 101  (maximum sentence length)
>>>>
>>>> Here is another extract
>>>>
>>>> ./train-AllCorpusIndo.sh: line 1184:  2756 Aborted                 (core
>>>> dumped) $toolsdir/mgiza/bin/mgiza -ncpus $mgizanumprocessors -c
>>>> $modeldir/$lang1-$lang2-int-train.snt -o $modeldir/$lang1-$lang2 -s
>>>> $modeldir/$lang2.vcb -t $modeldir/$lang1.vcb -coocurrencefile
>>>> $modeldir/$lang2-$lang1.cooc -ml $ml -countincreasecutoff
>>>> $countincreasecutoff -countincreasecutoffal $countincreasecutoffal
>>>> -mincountincrease $mincountincrease -peggedcutoff $peggedcutoff -probcutoff
>>>> $probcutoff -probsmooth $probsmooth -m1 $model1iterations -m2
>>>> $model2iterations -mh $hmmiterations -m3 $model3iterations -m4
>>>> $model4iterations -m5 $model5iterations -m6 $model6iterations -t1
>>>> $model1dumpfrequency -t2 $model2dumpfrequency -t2to3 $transferdumpfrequency
>>>> -t345 $model345dumpfrequency -th $hmmdumpfrequency -onlyaldumps 
>>>> $onlyaldumps
>>>> -nodumps $nodumps -compactadtable $compactadtable -model4smoothfactor
>>>> $model4smoothfactor -compactalignmentformat $compactalignmentformat 
>>>> -verbose
>>>> $verbose -verbosesentence $verbosesentence -emalsmooth $emalsmooth
>>>> -model23smoothfactor $model23smoothfactor -model4smoothfactor
>>>> $model4smoothfactor -model5smoothfactor $model5smoothfactor -nsmooth
>>>> $nsmooth -nsmoothgeneral $nsmoothgeneral -deficientdistortionforemptyword
>>>> $deficientdistortionforemptyword -depm4 $depm4 -depm5 $depm5
>>>> -emalignmentdependencies $emalignmentdependencies -emprobforempty
>>>> $emprobforempty -m5p0 $m5p0 -manlexfactor1 $manlexfactor1 -manlexfactor2
>>>> $manlexfactor2 -manlexmaxmultiplicity $manlexmaxmultiplicity -maxfertility
>>>> $maxfertility -p0 $p0 -pegging $pegging
>>>> ****** phase 2.1 of training (merge alignments)
>>>> Traceback (most recent call last):
>>>>   File "/home/Jelita/moses/tools/mgiza/scripts/merge_alignment.py", line
>>>> 24, in <module>
>>>>     files.append(open(sys.argv[i],"r"));
>>>> IOError: [Errno 2] No such file or directory:
>>>> '/home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/id-en.A3.final.part*'
>>>> Traceback (most recent call last):
>>>>   File "/home/Jelita/moses/tools/mgiza/scripts/merge_alignment.py", line
>>>> 24, in <module>
>>>>     files.append(open(sys.argv[i],"r"));
>>>> IOError: [Errno 2] No such file or directory:
>>>> '/home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/en-id.A3.final.part*'
>>>> ****** Rest of parallel training
>>>> Using SCRIPTS_ROOTDIR: /home/Jelita/moses/tools/moses/scripts
>>>> Using single-thread GIZA
>>>> (3) generate word alignment @ Tue Nov  6 13:07:31 SEAST 2012
>>>> Combining forward and inverted alignment from files:
>>>>
>>>> /home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/id-en.A3.final.{bz2,gz}
>>>>
>>>> /home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/en-id.A3.final.{bz2,gz}
>>>> Executing: mkdir -p
>>>> /home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6
>>>> Executing:
>>>> /home/Jelita/moses/tools/moses/scripts/training/symal/giza2bal.pl
>>>> <http://giza2bal.pl> -d "gzip -cd
>>>> /home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/en-id.A3.final.gz"
>>>> -i "gzip -cd
>>>> /home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/id-en.A3.final.gz"
>>>> |/home/Jelita/moses/tools/moses/scripts/training/symal/symal
>>>> -alignment="grow" -diagonal="yes" -final="yes" -both="yes" >
>>>> /home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/aligned.grow-diag-final-and
>>>>
>>>> symal: computing grow alignment: diagonal (1) final (1)both-uncovered
>>>> (1)
>>>> skip=<0> counts=<0>
>>>> (4) generate lexical translation table 0-0 @ Tue Nov  6 13:07:31 SEAST
>>>> 2012
>>>>
>>>> (/home/Jelita/moses/corpora_trained/lc_clean/MinLen-1.MaxLen-60/CleanAllCorpus15Oct2012.for_train.lowercase.id
>>>> <http://CleanAllCorpus15Oct2012.for_train.lowercase.id>,/home/Jelita/moses/corpora_trained/lc_clean/MinLen-1.MaxLen-60/CleanAllCorpus15Oct2012.for_train.lowercase.en,/home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/lex)
>>>>
>>>> !Use of uninitialized value $a in scalar chomp at
>>>> /home/Jelita/moses/tools/moses/scripts/training/train-model.perl line 1079.
>>>> Use of uninitialized value $a in split at
>>>> /home/Jelita/moses/tools/moses/scripts/training/train-model.perl line 1082.
>>>>
>>>> What is the cause? I use cygwin for Windows 7 on a 64-bit machine.
>>>> I ran a few times and it can't get pass the Model 1 training.
>>>>
>>>> Thanks.
>>>>
>>>> Best regards,
>>>>
>>>> Jelita
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Moses-support mailing list
>>>> [email protected]
>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>>
>>>
>>> --
>>> The University of Edinburgh is a charitable body, registered in
>>> Scotland, with registration number SC005336.
>>>
>>
>
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to