Parsing of log gave me this warnings: WARNING: DIFFERENT SUMS: (1) (1.15031) WARNING: DIFFERENT SUMS: (1) (1.18892) WARNING: Model2 viterbi alignment has zero score. Here are the different elements that made this alignment probability zero
And this strange piece: (4) generate lexical translation table 0-0 @ Sun Feb 22 03:07:38 MSK 2015 (/home/adminadmin/corpus/ru-en.clean.ru ,/home/adminadmin/corpus/ru-en.clean.en,/home/adminadmin/working/train/model/lex) !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!...There are TONS of exclamations marks. Saved: /home/adminadmin/working/train/model/lex.f2e and /home/adminadmin/working/train/model/lex.e2f FILE: /home/adminadmin/corpus/ru-en.clean.en What does it mean? 2015-02-25 12:32 GMT+07:00 Александр Паньшин <[email protected]>: > Ok, I've started from scratch. I'm pretty sure that I worked with corpus > such a way: > > 1. I tokenized the initial corpuses with tokenizer.perl. Learned numbers > of lines caused any errors and warnings > 2. Deleted these lines from both files using sed > 3. Tokenized the files again. No errors > 5. Created truecase-model and truecases the files. > 6. Deleted too long lines by using clean-corpus-n.perl 1 50 > > Started translation model creation process by: > > nohup nice /opt/moses/scripts/training/train-model.perl --parallel -mgiza > -mgiza-cpus 40 -cores 40 -root-dir train -corpus ~/corpus/ru-en.clean -f ru > -e en -alignment grow-diag-final-and -reordering msd-bidirectional-fe -lm > 0:3:$HOME/lm/ru-en.arpa.en:8 -external-bin-dir /opt/moses/mgiza >& > training.out & > > After ten days of waiting I have 20-bytes long phraze-table.tgz again! > What I'm doing wrong? > > I have both ru-en and en-ru A3.final.gz files, > aligned-grow-diag-final.and, lex.e2f, lex.f2e of quite good size, but empty > phrase-table, extract.*.sorted.gz and reordering table. > > I'm still having no idea what and why goes wrong:( > > 2015-02-14 21:54 GMT+07:00 Kenneth Heafield <[email protected]>: > >> Sign my petition to add return code checking to train-model.perl. >> >> On 02/14/2015 09:33 AM, Tom Hoar wrote: >> > An empty phrase-table.gz file is usually the result of an ill-prepared >> > training corpus. Make sure you run the final corpus through >> > clean-corpus-n.perl. >> > >> > >> > >> > On 02/14/2015 09:19 PM, Александр Паньшин wrote: >> >> Hello, everybody! >> >> >> >> I have a problem with moses. I created big parallel corpus by >> >> concatenating a bunch of existing corpuses on >> >> http://opus.lingfil.uu.se. After that I cleaned up results (while >> >> creating tokens script reported some errors. I deleted error-prone >> >> rows from both of parts). >> >> >> >> Then I started to train translation model using mgiza with such an >> >> executable: >> >> >> >> nohup nice /opt/moses/scripts/training/train-model.perl --parallel >> >> -mgiza -mgiza-cpus 20 -cores 20 -root-dir train -corpus >> >> ~/corpus/ru-en.clean -f ru -e en -alignment grow-diag-final-and >> >> -reordering msd-bidirectional-fe -lm 0:3:$HOME/lm/ru-en.arpa.en:8 >> >> -external-bin-dir /opt/moses/mgiza >& training.out & >> >> >> >> After a week of work I have this in the end of training.out: >> >> (7) learn reordering model @ Sun Feb 8 15:30:35 MSK 2015 >> >> (7.1) [no factors] learn reordering model @ Sun Feb 8 15:30:35 MSK >> 2015 >> >> (7.2) building tables @ Sun Feb 8 15:30:35 MSK 2015 >> >> Executing: /opt/moses/scripts/../bin/lexical-reordering-score >> >> /home/adminadmin/working/train/model/extract.o.sorted.gz 0.5 >> >> /home/adminadmin/working/train/model/reordering-table. --model "wbe >> >> msd wbe-msd-bidirectional-fe" >> >> Lexical Reordering Scorer >> >> scores lexical reordering models of several types (hierarchical, >> >> phrase-based and word-based-extraction >> >> (8) learn generation model @ Sun Feb 8 15:30:35 MSK 2015 >> >> no generation model requested, skipping step >> >> (9) create moses.ini @ Sun Feb 8 15:30:35 MSK 2015 >> >> >> >> There is a bunch of files in ~/working/train folder. Looks like >> >> everything is ok, except the tiny problem: phrase-table.tgz has size >> >> of 20 bytes. And, of course, it's not usable at all! >> >> >> >> Can somebody help and give me a direction where to dig? >> >> >> >> >> >> _______________________________________________ >> >> Moses-support mailing list >> >> [email protected] >> >> http://mailman.mit.edu/mailman/listinfo/moses-support >> > >> > >> > >> > _______________________________________________ >> > Moses-support mailing list >> > [email protected] >> > http://mailman.mit.edu/mailman/listinfo/moses-support >> > >> _______________________________________________ >> Moses-support mailing list >> [email protected] >> http://mailman.mit.edu/mailman/listinfo/moses-support >> > >
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
