on cygwin, there is no way to solve the problem. 2GB is the maximum any
process can use on cygwin.
If you have large model files, you should move to 64-bit linux or
macOSX, with plenty of memory.
Or try and compile mgiza and moses without cygwin. For example, use
mingw or visual studio.
This will require some work. Other people may be trying to do the same
thing, so maybe team up with them.
For mgiza, you can minimize memory by following this
http://www.statmt.org/moses/?n=Moses.Optimize#ntoc8
However, you may encounter more memory problems further down the
pipeline anyway.
I would personally advised against using berkeley aligner. IMO, it's
buggy and questions to their developers go unanswered.
On 12/11/2012 12:56, Jelita Asian wrote:
Hi Barry,
Actually how do we solve the more than 2 GB memory problem? Thanks.
Best regards,
Jelita
On Fri, Nov 9, 2012 at 10:51 AM, Jelita Asian
<[email protected] <mailto:[email protected]>> wrote:
Hi Barry,
Thanks. I will look into it now.
Cheers,
Jelita
On Thu, Nov 8, 2012 at 10:09 PM, Barry Haddow
<[email protected] <mailto:[email protected]>>
wrote:
Hi Jelita
It could be running out of memory. Under cygwin, mgiza will be
limited to 2GB
http://www.statmt.org/moses/?n=Moses.FAQ#ntoc9
cheers - Barry
On 06/11/12 07:33, Jelita Asian wrote:
Hi,
I run Moses training using moses-for-mere-mortal scripts.
The run is used to be OK. However, since I increase the
number of words (mostly numbers written in words where
words act as parallel sentences in corpus for Indonesian
and English), I keep getting mgiza stack-dump, hence the
training is failed.
Here is the extract for the log file of the run:
-----------
Model1: Iteration 5
Reading more sentence pairs into memory ...
[sent:100000]
Reading more sentence pairs into memory ...
Reading more sentence pairs into memory ...
Reading more sentence pairs into memory ...
Model1: (5) TRAIN CROSS-ENTROPY 5.82453 PERPLEXITY 56.6706
Model1: (5) VITERBI TRAIN CROSS-ENTROPY 6.59753 PERPLEXITY
96.8401
Model 1 Iteration: 5 took: 87 seconds
Entire Model1 Training took: 444 seconds
NOTE: I am doing iterations with the HMM model!
Read classes: #words: 48562 #classes: 51
Actual number of read words: 48561 stored words: 48561
Read classes: #words: 45484 #classes: 51
Actual number of read words: 45483 stored words: 45483
==========================================================
Hmm Training Started at: Tue Nov 6 12:46:41 2012
./train-AllCorpusIndo.sh: line 1184: 3936 Aborted
(core dumped) $toolsdir/mgiza/bin/mgiza -ncpus
$mgizanumprocessors -c
$modeldir/$lang2-$lang1-int-train.snt -o
$modeldir/$lang2-$lang1 -s $modeldir/$lang1.vcb -t
$modeldir/$lang2.vcb -coocurrencefile
$modeldir/$lang1-$lang2.cooc -ml $ml -countincreasecutoff
$countincreasecutoff -countincreasecutoffal
$countincreasecutoffal -mincountincrease $mincountincrease
-peggedcutoff $peggedcutoff -probcutoff $probcutoff
-probsmooth $probsmooth -m1 $model1iterations -m2
$model2iterations -mh $hmmiterations -m3 $model3iterations
-m4 $model4iterations -m5 $model5iterations -m6
$model6iterations -t1 $model1dumpfrequency -t2
$model2dumpfrequency -t2to3 $transferdumpfrequency -t345
$model345dumpfrequency -th $hmmdumpfrequency -onlyaldumps
$onlyaldumps -nodumps $nodumps -compactadtable
$compactadtable -model4smoothfactor $model4smoothfactor
-compactalignmentformat $compactalignmentformat -verbose
$verbose -verbosesentence $verbosesentence -emalsmooth
$emalsmooth -model23smoothfactor $model23smoothfactor
-model4smoothfactor $model4smoothfactor
-model5smoothfactor $model5smoothfactor -nsmooth $nsmooth
-nsmoothgeneral $nsmoothgeneral
-deficientdistortionforemptyword
$deficientdistortionforemptyword -depm4 $depm4 -depm5
$depm5 -emalignmentdependencies $emalignmentdependencies
-emprobforempty $emprobforempty -m5p0 $m5p0 -manlexfactor1
$manlexfactor1 -manlexfactor2 $manlexfactor2
-manlexmaxmultiplicity $manlexmaxmultiplicity
-maxfertility $maxfertility -p0 $p0 -pegging $pegging
Starting MGIZA
Initializing Global Paras
DEBUG: EnterDEBUG: PrefixDEBUG: LogParsing Arguments
Parameter 'ncpus' changed from '2' to '8'
Parameter 'c' changed from '' to
'/home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/id-en-int-train.snt'
Parameter 'o' changed from '112-11-06.124815.Jelita' to
'/home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/id-en'
Parameter 's' changed from '' to
'/home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/en.vcb'
Parameter 't' changed from '' to
'/home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/id.vcb'
Parameter 'coocurrencefile' changed from '' to
'/home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/en-id.cooc'
Parameter 'm3' changed from '5' to '3'
Parameter 'm4' changed from '5' to '3'
Parameter 'onlyaldumps' changed from '0' to '1'
Parameter 'nodumps' changed from '0' to '1'
Parameter 'model4smoothfactor' changed from '0.2' to '0.4'
Parameter 'nsmooth' changed from '64' to '4'
Parameter 'p0' changed from '-1' to '0.999'
general parameters:
-------------------
ml = 101 (maximum sentence length)
Here is another extract
./train-AllCorpusIndo.sh: line 1184: 2756 Aborted
(core dumped) $toolsdir/mgiza/bin/mgiza -ncpus
$mgizanumprocessors -c
$modeldir/$lang1-$lang2-int-train.snt -o
$modeldir/$lang1-$lang2 -s $modeldir/$lang2.vcb -t
$modeldir/$lang1.vcb -coocurrencefile
$modeldir/$lang2-$lang1.cooc -ml $ml -countincreasecutoff
$countincreasecutoff -countincreasecutoffal
$countincreasecutoffal -mincountincrease $mincountincrease
-peggedcutoff $peggedcutoff -probcutoff $probcutoff
-probsmooth $probsmooth -m1 $model1iterations -m2
$model2iterations -mh $hmmiterations -m3 $model3iterations
-m4 $model4iterations -m5 $model5iterations -m6
$model6iterations -t1 $model1dumpfrequency -t2
$model2dumpfrequency -t2to3 $transferdumpfrequency -t345
$model345dumpfrequency -th $hmmdumpfrequency -onlyaldumps
$onlyaldumps -nodumps $nodumps -compactadtable
$compactadtable -model4smoothfactor $model4smoothfactor
-compactalignmentformat $compactalignmentformat -verbose
$verbose -verbosesentence $verbosesentence -emalsmooth
$emalsmooth -model23smoothfactor $model23smoothfactor
-model4smoothfactor $model4smoothfactor
-model5smoothfactor $model5smoothfactor -nsmooth $nsmooth
-nsmoothgeneral $nsmoothgeneral
-deficientdistortionforemptyword
$deficientdistortionforemptyword -depm4 $depm4 -depm5
$depm5 -emalignmentdependencies $emalignmentdependencies
-emprobforempty $emprobforempty -m5p0 $m5p0 -manlexfactor1
$manlexfactor1 -manlexfactor2 $manlexfactor2
-manlexmaxmultiplicity $manlexmaxmultiplicity
-maxfertility $maxfertility -p0 $p0 -pegging $pegging
****** phase 2.1 of training (merge alignments)
Traceback (most recent call last):
File
"/home/Jelita/moses/tools/mgiza/scripts/merge_alignment.py",
line 24, in <module>
files.append(open(sys.argv[i],"r"));
IOError: [Errno 2] No such file or directory:
'/home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/id-en.A3.final.part*'
Traceback (most recent call last):
File
"/home/Jelita/moses/tools/mgiza/scripts/merge_alignment.py",
line 24, in <module>
files.append(open(sys.argv[i],"r"));
IOError: [Errno 2] No such file or directory:
'/home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/en-id.A3.final.part*'
****** Rest of parallel training
Using SCRIPTS_ROOTDIR: /home/Jelita/moses/tools/moses/scripts
Using single-thread GIZA
(3) generate word alignment @ Tue Nov 6 13:07:31 SEAST 2012
Combining forward and inverted alignment from files:
/home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/id-en.A3.final.{bz2,gz}
/home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/en-id.A3.final.{bz2,gz}
Executing: mkdir -p
/home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6
Executing:
/home/Jelita/moses/tools/moses/scripts/training/symal/giza2bal.pl
<http://giza2bal.pl> <http://giza2bal.pl> -d "gzip -cd
/home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/en-id.A3.final.gz"
-i "gzip -cd
/home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/id-en.A3.final.gz"
|/home/Jelita/moses/tools/moses/scripts/training/symal/symal
-alignment="grow" -diagonal="yes" -final="yes" -both="yes"
>
/home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/aligned.grow-diag-final-and
symal: computing grow alignment: diagonal (1) final
(1)both-uncovered (1)
skip=<0> counts=<0>
(4) generate lexical translation table 0-0 @ Tue Nov 6
13:07:31 SEAST 2012
(/home/Jelita/moses/corpora_trained/lc_clean/MinLen-1.MaxLen-60/CleanAllCorpus15Oct2012.for_train.lowercase.id
<http://CleanAllCorpus15Oct2012.for_train.lowercase.id>
<http://CleanAllCorpus15Oct2012.for_train.lowercase.id>,/home/Jelita/moses/corpora_trained/lc_clean/MinLen-1.MaxLen-60/CleanAllCorpus15Oct2012.for_train.lowercase.en,/home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/lex)
!Use of uninitialized value $a in scalar chomp at
/home/Jelita/moses/tools/moses/scripts/training/train-model.perl
line 1079.
Use of uninitialized value $a in split at
/home/Jelita/moses/tools/moses/scripts/training/train-model.perl
line 1082.
What is the cause? I use cygwin for Windows 7 on a 64-bit
machine.
I ran a few times and it can't get pass the Model 1 training.
Thanks.
Best regards,
Jelita
_______________________________________________
Moses-support mailing list
[email protected] <mailto:[email protected]>
http://mailman.mit.edu/mailman/listinfo/moses-support
--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support