Dear Mailing List, I am running the commands below from a file:
export SCRIPTS_ROOTDIR=/data/aca04iba/en-es/bin/moses-scripts/scripts-20080411-1824 $SCRIPTS_ROOTDIR/training/train-factored-phrase-model.perl -scripts-root-dir $SCRIPTS_ROOTDIR -root-dir /data/aca04iba/en-es/ -corpus /data/aca04iba/en-es/training/corpus.lowercased -f es -e en -alignment grow-diag-final-and -reordering msd-bidirectional-fe -lm 0:5:/data/aca04iba/en-es/lm/corpus.lm:0 This is producing a massive model directory with contents: ls -lh 92M Apr 22 22:49 aligned.0.en 99M Apr 22 22:48 aligned.0.es 57M Apr 23 11:27 aligned.grow-diag-final-and 703M Apr 23 12:06 extract.0-0.gz 689M Apr 23 12:12 extract.0-0.inv.gz 705M Apr 23 12:54 extract.0-0.inv.sorted.gz 532M Apr 23 11:57 extract.0-0.o.gz 696M Apr 23 12:29 extract.0-0.sorted.gz 90M Apr 22 22:51 lex.0-0.f2n 90M Apr 22 22:51 lex.0-0.n2f 14G Apr 23 14:39 phrase-table.0-0.half.f2n 7.0G Apr 23 16:16 phrase-table.0-0.half.n2f 809M Apr 23 14:52 phrase-table.0-0.half.n2f.part0000 986M Apr 23 14:58 phrase-table.0-0.half.n2f.part0001 979M Apr 23 15:04 phrase-table.0-0.half.n2f.part0002 996M Apr 23 15:10 phrase-table.0-0.half.n2f.part0003 989M Apr 23 15:16 phrase-table.0-0.half.n2f.part0004 958M Apr 23 15:21 phrase-table.0-0.half.n2f.part0005 962M Apr 23 15:27 phrase-table.0-0.half.n2f.part0006 972M Apr 23 15:33 phrase-table.0-0.half.n2f.part0007 979M Apr 23 15:39 phrase-table.0-0.half.n2f.part0008 999M Apr 23 15:45 phrase-table.0-0.half.n2f.part0009 995M Apr 23 15:51 phrase-table.0-0.half.n2f.part0010 1020M Apr 23 15:56 phrase-table.0-0.half.n2f.part0011 965M Apr 23 16:02 phrase-table.0-0.half.n2f.part0012 934M Apr 23 16:08 phrase-table.0-0.half.n2f.part0013 381M Apr 23 16:10 phrase-table.0-0.half.n2f.part0014 The whole operation fails as I hit my quota allowance. Should this be producing such large files. This model directory is 38G. I didn't realise it would be quite this large. Can anyone advise as to why this is happening? I am training on Europarl corpus. 92M Apr 13 14:14 corpus.lowercased.en 99M Apr 13 14:13 corpus.lowercased.es and my language model is: 155M Apr 13 15:02 corpus.lm Iain -- Iain Adams 4th Year Undergraduate MCOMP Marketing Team Genesys Solutions _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
