Ok, I've started from scratch. I'm pretty sure that I worked with corpus such a way:
1. I tokenized the initial corpuses with tokenizer.perl. Learned numbers of lines caused any errors and warnings 2. Deleted these lines from both files using sed 3. Tokenized the files again. No errors 5. Created truecase-model and truecases the files. 6. Deleted too long lines by using clean-corpus-n.perl 1 50 Started translation model creation process by: nohup nice /opt/moses/scripts/training/train-model.perl --parallel -mgiza -mgiza-cpus 40 -cores 40 -root-dir train -corpus ~/corpus/ru-en.clean -f ru -e en -alignment grow-diag-final-and -reordering msd-bidirectional-fe -lm 0:3:$HOME/lm/ru-en.arpa.en:8 -external-bin-dir /opt/moses/mgiza >& training.out & After ten days of waiting I have 20-bytes long phraze-table.tgz again! What I'm doing wrong? I have both ru-en and en-ru A3.final.gz files, aligned-grow-diag-final.and, lex.e2f, lex.f2e of quite good size, but empty phrase-table, extract.*.sorted.gz and reordering table. I'm still having no idea what and why goes wrong:( 2015-02-14 21:54 GMT+07:00 Kenneth Heafield <[email protected]>: > Sign my petition to add return code checking to train-model.perl. > > On 02/14/2015 09:33 AM, Tom Hoar wrote: > > An empty phrase-table.gz file is usually the result of an ill-prepared > > training corpus. Make sure you run the final corpus through > > clean-corpus-n.perl. > > > > > > > > On 02/14/2015 09:19 PM, Александр Паньшин wrote: > >> Hello, everybody! > >> > >> I have a problem with moses. I created big parallel corpus by > >> concatenating a bunch of existing corpuses on > >> http://opus.lingfil.uu.se. After that I cleaned up results (while > >> creating tokens script reported some errors. I deleted error-prone > >> rows from both of parts). > >> > >> Then I started to train translation model using mgiza with such an > >> executable: > >> > >> nohup nice /opt/moses/scripts/training/train-model.perl --parallel > >> -mgiza -mgiza-cpus 20 -cores 20 -root-dir train -corpus > >> ~/corpus/ru-en.clean -f ru -e en -alignment grow-diag-final-and > >> -reordering msd-bidirectional-fe -lm 0:3:$HOME/lm/ru-en.arpa.en:8 > >> -external-bin-dir /opt/moses/mgiza >& training.out & > >> > >> After a week of work I have this in the end of training.out: > >> (7) learn reordering model @ Sun Feb 8 15:30:35 MSK 2015 > >> (7.1) [no factors] learn reordering model @ Sun Feb 8 15:30:35 MSK 2015 > >> (7.2) building tables @ Sun Feb 8 15:30:35 MSK 2015 > >> Executing: /opt/moses/scripts/../bin/lexical-reordering-score > >> /home/adminadmin/working/train/model/extract.o.sorted.gz 0.5 > >> /home/adminadmin/working/train/model/reordering-table. --model "wbe > >> msd wbe-msd-bidirectional-fe" > >> Lexical Reordering Scorer > >> scores lexical reordering models of several types (hierarchical, > >> phrase-based and word-based-extraction > >> (8) learn generation model @ Sun Feb 8 15:30:35 MSK 2015 > >> no generation model requested, skipping step > >> (9) create moses.ini @ Sun Feb 8 15:30:35 MSK 2015 > >> > >> There is a bunch of files in ~/working/train folder. Looks like > >> everything is ok, except the tiny problem: phrase-table.tgz has size > >> of 20 bytes. And, of course, it's not usable at all! > >> > >> Can somebody help and give me a direction where to dig? > >> > >> > >> _______________________________________________ > >> Moses-support mailing list > >> [email protected] > >> http://mailman.mit.edu/mailman/listinfo/moses-support > > > > > > > > _______________________________________________ > > Moses-support mailing list > > [email protected] > > http://mailman.mit.edu/mailman/listinfo/moses-support > > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support >
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
