Thanks for everyone who answer! I guess the solution is either to break up
the corpus or run it using mingw-w64. Thanks.

Best regards,

Jelita

On Tue, Nov 13, 2012 at 6:08 AM, Hieu Hoang <[email protected]> wrote:

>  on cygwin, there is no way to solve the problem. 2GB is the maximum any
> process can use on cygwin.
>
> If you have large model files, you should move to 64-bit linux or macOSX,
> with plenty of memory.
>
> Or try and compile mgiza and moses without cygwin. For example, use mingw
> or visual studio.
>
> This will require some work. Other people may be trying to do the same
> thing, so maybe team up with them.
>
> For mgiza, you can minimize memory by following this
>    http://www.statmt.org/moses/?n=Moses.Optimize#ntoc8
> However, you may encounter more memory problems further down the pipeline
> anyway.
>
> I would personally advised against using berkeley aligner. IMO, it's buggy
> and questions to their developers go unanswered.
>
>
> On 12/11/2012 12:56, Jelita Asian wrote:
>
> Hi Barry,
>
> Actually how do we solve the more than 2 GB memory problem? Thanks.
>
> Best regards,
>
> Jelita
>
> On Fri, Nov 9, 2012 at 10:51 AM, Jelita Asian <[email protected]
> > wrote:
>
>> Hi Barry,
>>
>> Thanks. I will look into it now.
>>
>> Cheers,
>>
>> Jelita
>>
>>
>> On Thu, Nov 8, 2012 at 10:09 PM, Barry Haddow <[email protected]
>> > wrote:
>>
>>> Hi Jelita
>>>
>>> It could be running out of memory. Under cygwin, mgiza will be limited
>>> to 2GB
>>> http://www.statmt.org/moses/?n=Moses.FAQ#ntoc9
>>>
>>> cheers - Barry
>>>
>>>
>>> On 06/11/12 07:33, Jelita Asian wrote:
>>>
>>>>  Hi,
>>>>
>>>> I run Moses training using moses-for-mere-mortal scripts. The run is
>>>> used to be OK. However, since I increase the number of words (mostly
>>>> numbers written in words where words act as parallel sentences in corpus
>>>> for Indonesian and English), I keep getting mgiza stack-dump, hence the
>>>> training is failed.
>>>>
>>>> Here is the extract for the log file of the run:
>>>>
>>>> -----------
>>>> Model1: Iteration 5
>>>> Reading more sentence pairs into memory ...
>>>> [sent:100000]
>>>> Reading more sentence pairs into memory ...
>>>> Reading more sentence pairs into memory ...
>>>> Reading more sentence pairs into memory ...
>>>> Model1: (5) TRAIN CROSS-ENTROPY 5.82453 PERPLEXITY 56.6706
>>>> Model1: (5) VITERBI TRAIN CROSS-ENTROPY 6.59753 PERPLEXITY 96.8401
>>>> Model 1 Iteration: 5 took: 87 seconds
>>>> Entire Model1 Training took: 444 seconds
>>>> NOTE: I am doing iterations with the HMM model!
>>>> Read classes: #words: 48562  #classes: 51
>>>> Actual number of read words: 48561 stored words: 48561
>>>> Read classes: #words: 45484  #classes: 51
>>>> Actual number of read words: 45483 stored words: 45483
>>>>
>>>> ==========================================================
>>>> Hmm Training Started at: Tue Nov  6 12:46:41 2012
>>>>
>>>> ./train-AllCorpusIndo.sh: line 1184:  3936 Aborted
>>>> (core dumped) $toolsdir/mgiza/bin/mgiza -ncpus $mgizanumprocessors -c
>>>> $modeldir/$lang2-$lang1-int-train.snt -o $modeldir/$lang2-$lang1 -s
>>>> $modeldir/$lang1.vcb -t $modeldir/$lang2.vcb -coocurrencefile
>>>> $modeldir/$lang1-$lang2.cooc -ml $ml -countincreasecutoff
>>>> $countincreasecutoff -countincreasecutoffal $countincreasecutoffal
>>>> -mincountincrease $mincountincrease -peggedcutoff $peggedcutoff -probcutoff
>>>> $probcutoff -probsmooth $probsmooth -m1 $model1iterations -m2
>>>> $model2iterations -mh $hmmiterations -m3 $model3iterations -m4
>>>> $model4iterations -m5 $model5iterations -m6 $model6iterations -t1
>>>> $model1dumpfrequency -t2 $model2dumpfrequency -t2to3 $transferdumpfrequency
>>>> -t345 $model345dumpfrequency -th $hmmdumpfrequency -onlyaldumps
>>>> $onlyaldumps -nodumps $nodumps -compactadtable $compactadtable
>>>> -model4smoothfactor $model4smoothfactor -compactalignmentformat
>>>> $compactalignmentformat -verbose $verbose -verbosesentence $verbosesentence
>>>> -emalsmooth $emalsmooth -model23smoothfactor $model23smoothfactor
>>>> -model4smoothfactor $model4smoothfactor -model5smoothfactor
>>>> $model5smoothfactor -nsmooth $nsmooth -nsmoothgeneral $nsmoothgeneral
>>>> -deficientdistortionforemptyword $deficientdistortionforemptyword -depm4
>>>> $depm4 -depm5 $depm5 -emalignmentdependencies $emalignmentdependencies
>>>> -emprobforempty $emprobforempty -m5p0 $m5p0 -manlexfactor1 $manlexfactor1
>>>> -manlexfactor2 $manlexfactor2 -manlexmaxmultiplicity $manlexmaxmultiplicity
>>>> -maxfertility $maxfertility -p0 $p0 -pegging $pegging
>>>> Starting MGIZA
>>>> Initializing Global Paras
>>>> DEBUG: EnterDEBUG: PrefixDEBUG: LogParsing Arguments
>>>> Parameter 'ncpus' changed from '2' to '8'
>>>> Parameter 'c' changed from '' to
>>>> '/home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/id-en-int-train.snt'
>>>> Parameter 'o' changed from '112-11-06.124815.Jelita' to
>>>> '/home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/id-en'
>>>> Parameter 's' changed from '' to
>>>> '/home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/en.vcb'
>>>> Parameter 't' changed from '' to
>>>> '/home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/id.vcb'
>>>> Parameter 'coocurrencefile' changed from '' to
>>>> '/home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/en-id.cooc'
>>>> Parameter 'm3' changed from '5' to '3'
>>>> Parameter 'm4' changed from '5' to '3'
>>>> Parameter 'onlyaldumps' changed from '0' to '1'
>>>> Parameter 'nodumps' changed from '0' to '1'
>>>> Parameter 'model4smoothfactor' changed from '0.2' to '0.4'
>>>> Parameter 'nsmooth' changed from '64' to '4'
>>>> Parameter 'p0' changed from '-1' to '0.999'
>>>> general parameters:
>>>> -------------------
>>>> ml = 101  (maximum sentence length)
>>>>
>>>> Here is another extract
>>>>
>>>> ./train-AllCorpusIndo.sh: line 1184:  2756 Aborted
>>>> (core dumped) $toolsdir/mgiza/bin/mgiza -ncpus $mgizanumprocessors -c
>>>> $modeldir/$lang1-$lang2-int-train.snt -o $modeldir/$lang1-$lang2 -s
>>>> $modeldir/$lang2.vcb -t $modeldir/$lang1.vcb -coocurrencefile
>>>> $modeldir/$lang2-$lang1.cooc -ml $ml -countincreasecutoff
>>>> $countincreasecutoff -countincreasecutoffal $countincreasecutoffal
>>>> -mincountincrease $mincountincrease -peggedcutoff $peggedcutoff -probcutoff
>>>> $probcutoff -probsmooth $probsmooth -m1 $model1iterations -m2
>>>> $model2iterations -mh $hmmiterations -m3 $model3iterations -m4
>>>> $model4iterations -m5 $model5iterations -m6 $model6iterations -t1
>>>> $model1dumpfrequency -t2 $model2dumpfrequency -t2to3 $transferdumpfrequency
>>>> -t345 $model345dumpfrequency -th $hmmdumpfrequency -onlyaldumps
>>>> $onlyaldumps -nodumps $nodumps -compactadtable $compactadtable
>>>> -model4smoothfactor $model4smoothfactor -compactalignmentformat
>>>> $compactalignmentformat -verbose $verbose -verbosesentence $verbosesentence
>>>> -emalsmooth $emalsmooth -model23smoothfactor $model23smoothfactor
>>>> -model4smoothfactor $model4smoothfactor -model5smoothfactor
>>>> $model5smoothfactor -nsmooth $nsmooth -nsmoothgeneral $nsmoothgeneral
>>>> -deficientdistortionforemptyword $deficientdistortionforemptyword -depm4
>>>> $depm4 -depm5 $depm5 -emalignmentdependencies $emalignmentdependencies
>>>> -emprobforempty $emprobforempty -m5p0 $m5p0 -manlexfactor1 $manlexfactor1
>>>> -manlexfactor2 $manlexfactor2 -manlexmaxmultiplicity $manlexmaxmultiplicity
>>>> -maxfertility $maxfertility -p0 $p0 -pegging $pegging
>>>> ****** phase 2.1 of training (merge alignments)
>>>> Traceback (most recent call last):
>>>>   File "/home/Jelita/moses/tools/mgiza/scripts/merge_alignment.py",
>>>> line 24, in <module>
>>>>     files.append(open(sys.argv[i],"r"));
>>>> IOError: [Errno 2] No such file or directory:
>>>> '/home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/id-en.A3.final.part*'
>>>> Traceback (most recent call last):
>>>>   File "/home/Jelita/moses/tools/mgiza/scripts/merge_alignment.py",
>>>> line 24, in <module>
>>>>     files.append(open(sys.argv[i],"r"));
>>>> IOError: [Errno 2] No such file or directory:
>>>> '/home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/en-id.A3.final.part*'
>>>> ****** Rest of parallel training
>>>> Using SCRIPTS_ROOTDIR: /home/Jelita/moses/tools/moses/scripts
>>>> Using single-thread GIZA
>>>> (3) generate word alignment @ Tue Nov  6 13:07:31 SEAST 2012
>>>> Combining forward and inverted alignment from files:
>>>>
>>>> /home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/id-en.A3.final.{bz2,gz}
>>>>
>>>> /home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/en-id.A3.final.{bz2,gz}
>>>> Executing: mkdir -p
>>>> /home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6
>>>>  Executing: /home/Jelita/moses/tools/moses/scripts/training/symal/
>>>> giza2bal.pl <http://giza2bal.pl> -d "gzip -cd
>>>> /home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/en-id.A3.final.gz"
>>>> -i "gzip -cd
>>>> /home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/id-en.A3.final.gz"
>>>> |/home/Jelita/moses/tools/moses/scripts/training/symal/symal
>>>> -alignment="grow" -diagonal="yes" -final="yes" -both="yes" >
>>>> /home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/aligned.grow-diag-final-and
>>>>
>>>>
>>>> symal: computing grow alignment: diagonal (1) final (1)both-uncovered
>>>> (1)
>>>> skip=<0> counts=<0>
>>>> (4) generate lexical translation table 0-0 @ Tue Nov  6 13:07:31 SEAST
>>>> 2012
>>>>  (/home/Jelita/moses/corpora_trained/lc_clean/MinLen-1.MaxLen-60/
>>>> CleanAllCorpus15Oct2012.for_train.lowercase.id <
>>>> http://CleanAllCorpus15Oct2012.for_train.lowercase.id>,/home/Jelita/moses/corpora_trained/lc_clean/MinLen-1.MaxLen-60/CleanAllCorpus15Oct2012.for_train.lowercase.en,/home/Jelita/moses/corpora_trained/model/id-en-CleanAllCorpus15Oct2012.for_train.LM-CleanAllCorpus15Oct2012.for_train-IRSTLM-4-1-improved-kneser-ney-0-1/T-1-1-9-MKCLS-2-50-MGIZA-8-GIZA-101-5-0-5-3-3-0-0-1e-06-1e-05-1e-07-0.03-1e-07-1e-07-0-0-0-0-0-0-0-1-1-0--10-0.2-0-0.4-0.1-4-0-1-0-76-68-2-0.4--1-0-0-20-10-0.999-0-MOSES-6-1-1-60-7-4-1-1-1-0-0-200-1.0-0-20-0-0-0-1000-100-20-0-6/lex)
>>>>
>>>>
>>>> !Use of uninitialized value $a in scalar chomp at
>>>> /home/Jelita/moses/tools/moses/scripts/training/train-model.perl line 1079.
>>>> Use of uninitialized value $a in split at
>>>> /home/Jelita/moses/tools/moses/scripts/training/train-model.perl line 1082.
>>>>
>>>> What is the cause? I use cygwin for Windows 7 on a 64-bit machine.
>>>> I ran a few times and it can't get pass the Model 1 training.
>>>>
>>>> Thanks.
>>>>
>>>> Best regards,
>>>>
>>>> Jelita
>>>>
>>>>
>>>>
>>>>  _______________________________________________
>>>> Moses-support mailing list
>>>> [email protected]
>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>
>>>
>>>
>>> --
>>> The University of Edinburgh is a charitable body, registered in
>>> Scotland, with registration number SC005336.
>>>
>>>
>>
>
>
> _______________________________________________
> Moses-support mailing 
> [email protected]http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to