Dear Mailing List,

I am running the commands below from a file:

export
SCRIPTS_ROOTDIR=/data/aca04iba/en-es/bin/moses-scripts/scripts-20080411-1824

$SCRIPTS_ROOTDIR/training/train-factored-phrase-model.perl -scripts-root-dir
$SCRIPTS_ROOTDIR -root-dir /data/aca04iba/en-es/ -corpus
/data/aca04iba/en-es/training/corpus.lowercased -f es -e en -alignment
grow-diag-final-and -reordering msd-bidirectional-fe -lm
0:5:/data/aca04iba/en-es/lm/corpus.lm:0

This is producing a massive model directory with contents:

ls -lh

92M Apr 22 22:49 aligned.0.en
99M Apr 22 22:48 aligned.0.es
57M Apr 23 11:27 aligned.grow-diag-final-and 703M Apr 23 12:06
extract.0-0.gz 689M Apr 23 12:12 extract.0-0.inv.gz 705M Apr 23 12:54
extract.0-0.inv.sorted.gz 532M Apr 23 11:57 extract.0-0.o.gz 696M Apr 23
12:29 extract.0-0.sorted.gz 90M Apr 22 22:51 lex.0-0.f2n 90M Apr 22 22:51
lex.0-0.n2f 14G Apr 23 14:39 phrase-table.0-0.half.f2n 7.0G Apr 23 16:16
phrase-table.0-0.half.n2f 809M Apr 23 14:52
phrase-table.0-0.half.n2f.part0000
986M Apr 23 14:58 phrase-table.0-0.half.n2f.part0001
979M Apr 23 15:04 phrase-table.0-0.half.n2f.part0002
996M Apr 23 15:10 phrase-table.0-0.half.n2f.part0003
989M Apr 23 15:16 phrase-table.0-0.half.n2f.part0004
958M Apr 23 15:21 phrase-table.0-0.half.n2f.part0005
962M Apr 23 15:27 phrase-table.0-0.half.n2f.part0006
972M Apr 23 15:33 phrase-table.0-0.half.n2f.part0007
979M Apr 23 15:39 phrase-table.0-0.half.n2f.part0008
999M Apr 23 15:45 phrase-table.0-0.half.n2f.part0009
995M Apr 23 15:51 phrase-table.0-0.half.n2f.part0010
1020M Apr 23 15:56 phrase-table.0-0.half.n2f.part0011
965M Apr 23 16:02 phrase-table.0-0.half.n2f.part0012
934M Apr 23 16:08 phrase-table.0-0.half.n2f.part0013
381M Apr 23 16:10 phrase-table.0-0.half.n2f.part0014

The whole operation fails as I hit my quota allowance. Should this be
producing such large files. This model directory is 38G. I didn't realise it
would be quite this large.

Can anyone advise as to why this is happening?

I am training on Europarl corpus.

92M Apr 13 14:14 corpus.lowercased.en
99M Apr 13 14:13 corpus.lowercased.es

and my language model is:

155M Apr 13 15:02 corpus.lm


Iain
-- 
Iain Adams
4th Year Undergraduate MCOMP
Marketing Team
Genesys Solutions



_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to