Hi Iain, that looks about right. There are large intermediate files during training, and the final phrase table will also be pretty big, several gigabytes.
-phi On Mon, Apr 28, 2008 at 8:07 PM, Iain Adams <[EMAIL PROTECTED]> wrote: > Dear Mailing List, > > I am running the commands below from a file: > > export > SCRIPTS_ROOTDIR=/data/aca04iba/en-es/bin/moses-scripts/scripts-20080411-1824 > > $SCRIPTS_ROOTDIR/training/train-factored-phrase-model.perl -scripts-root-dir > $SCRIPTS_ROOTDIR -root-dir /data/aca04iba/en-es/ -corpus > /data/aca04iba/en-es/training/corpus.lowercased -f es -e en -alignment > grow-diag-final-and -reordering msd-bidirectional-fe -lm > 0:5:/data/aca04iba/en-es/lm/corpus.lm:0 > > This is producing a massive model directory with contents: > > ls -lh > > 92M Apr 22 22:49 aligned.0.en > 99M Apr 22 22:48 aligned.0.es > 57M Apr 23 11:27 aligned.grow-diag-final-and 703M Apr 23 12:06 > extract.0-0.gz 689M Apr 23 12:12 extract.0-0.inv.gz 705M Apr 23 12:54 > extract.0-0.inv.sorted.gz 532M Apr 23 11:57 extract.0-0.o.gz 696M Apr 23 > 12:29 extract.0-0.sorted.gz 90M Apr 22 22:51 lex.0-0.f2n 90M Apr 22 22:51 > lex.0-0.n2f 14G Apr 23 14:39 phrase-table.0-0.half.f2n 7.0G Apr 23 16:16 > phrase-table.0-0.half.n2f 809M Apr 23 14:52 > phrase-table.0-0.half.n2f.part0000 > 986M Apr 23 14:58 phrase-table.0-0.half.n2f.part0001 > 979M Apr 23 15:04 phrase-table.0-0.half.n2f.part0002 > 996M Apr 23 15:10 phrase-table.0-0.half.n2f.part0003 > 989M Apr 23 15:16 phrase-table.0-0.half.n2f.part0004 > 958M Apr 23 15:21 phrase-table.0-0.half.n2f.part0005 > 962M Apr 23 15:27 phrase-table.0-0.half.n2f.part0006 > 972M Apr 23 15:33 phrase-table.0-0.half.n2f.part0007 > 979M Apr 23 15:39 phrase-table.0-0.half.n2f.part0008 > 999M Apr 23 15:45 phrase-table.0-0.half.n2f.part0009 > 995M Apr 23 15:51 phrase-table.0-0.half.n2f.part0010 > 1020M Apr 23 15:56 phrase-table.0-0.half.n2f.part0011 > 965M Apr 23 16:02 phrase-table.0-0.half.n2f.part0012 > 934M Apr 23 16:08 phrase-table.0-0.half.n2f.part0013 > 381M Apr 23 16:10 phrase-table.0-0.half.n2f.part0014 > > The whole operation fails as I hit my quota allowance. Should this be > producing such large files. This model directory is 38G. I didn't realise it > would be quite this large. > > Can anyone advise as to why this is happening? > > I am training on Europarl corpus. > > 92M Apr 13 14:14 corpus.lowercased.en > 99M Apr 13 14:13 corpus.lowercased.es > > and my language model is: > > 155M Apr 13 15:02 corpus.lm > > > Iain > -- > Iain Adams > 4th Year Undergraduate MCOMP > Marketing Team > Genesys Solutions > > > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support > > _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
