hi horia,
make sure your sentences are not too long, typically max 80 or 100 words
per sentence.
the moses script
clean-corpus-n.perl
takes care of this
On 04/01/2012 17:21, Horia Cucu wrote:
Hi everyone,
I'm trying to build a phrase table using 600k phrase pairs and I'm
encountering the following problem:
$MOSES_SCRIPTS/training/train-model.perl -scripts-root-dir
$MOSES_SCRIPTS -corpus dict.train -f gr -e ph -lm
0:3:/home/cucu/speechRoot/tools/wordsPhonetization/devel/phonetizer1/dict.train.ph.lm
Using SCRIPTS_ROOTDIR: /home/applications/moses/scripts
Using single-thread GIZA
(1) preparing corpus @ Wed Jan 4 11:49:51 EET 2012
Executing: mkdir -p ./corpus
(1.0) selecting factors @ Wed Jan 4 11:49:51 EET 2012
(1.1) running mkcls @ Wed Jan 4 11:49:51 EET 2012
/home/applications/giza-pp/GIZA++-v2/mkcls -c50 -n2 -pdict.train.gr
<http://pdict.train.gr> -V./corpus/gr.vcb.classes opt
Executing: /home/applications/giza-pp/GIZA++-v2/mkcls -c50 -n2
-pdict.train.gr <http://pdict.train.gr> -V./corpus/gr.vcb.classes opt
WARNING: StatVar.cc
At this point the training freezes and I cannot do anything else.
I've tried to localize this problem by selecting only some of the 600k
phrase pairs (*the first* 100k, the first 110k, etc.). Everything
worked fine with a dataset of up to 114993 phrase pairs, but failed
for a dataset of 114994 phrase-pairs.
I've also selected the last 100k phrases, the last 110k phrases, etc.
(thinking there could be a problem with the actual data). Here
everything worked fine with a dataset of up to 114996 phrase pairs,
but failed for a dataset of 114997 phrase-pairs.
Could this be a memory problem? top shows a memory usage of 0% for the
mkcls process and tells me I have 15GB of free RAM (out of a total of
16GB)...
Do you have any ideas of what might be the problem?! What else should
I check?
Thanks,
Horia
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support