Parsing of log gave me this warnings:

WARNING: DIFFERENT SUMS: (1) (1.15031)
WARNING: DIFFERENT SUMS: (1) (1.18892)
WARNING: Model2 viterbi alignment has zero score.
Here are the different elements that made this alignment probability zero

And this strange piece:
(4) generate lexical translation table 0-0 @ Sun Feb 22 03:07:38 MSK 2015
(/home/adminadmin/corpus/ru-en.clean.ru
,/home/adminadmin/corpus/ru-en.clean.en,/home/adminadmin/working/train/model/lex)
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!...There
are TONS of exclamations marks.
Saved: /home/adminadmin/working/train/model/lex.f2e and
/home/adminadmin/working/train/model/lex.e2f
FILE: /home/adminadmin/corpus/ru-en.clean.en

What does it mean?



2015-02-25 12:32 GMT+07:00 Александр Паньшин <[email protected]>:

> Ok, I've started from scratch. I'm pretty sure that I worked with corpus
> such a way:
>
> 1. I tokenized the initial corpuses with tokenizer.perl. Learned numbers
> of lines caused any errors and warnings
> 2. Deleted these lines from both files using sed
> 3. Tokenized the files again. No errors
> 5. Created truecase-model and truecases the files.
> 6. Deleted too long lines by using clean-corpus-n.perl 1 50
>
> Started translation model creation process by:
>
>  nohup nice /opt/moses/scripts/training/train-model.perl --parallel -mgiza
> -mgiza-cpus 40 -cores 40 -root-dir train -corpus ~/corpus/ru-en.clean -f ru
> -e en -alignment grow-diag-final-and -reordering msd-bidirectional-fe -lm
> 0:3:$HOME/lm/ru-en.arpa.en:8 -external-bin-dir /opt/moses/mgiza >&
> training.out &
>
> After ten days of waiting I have 20-bytes long phraze-table.tgz again!
> What I'm doing wrong?
>
> I have both ru-en and en-ru A3.final.gz files,
> aligned-grow-diag-final.and, lex.e2f, lex.f2e of quite good size, but empty
> phrase-table, extract.*.sorted.gz and reordering table.
>
> I'm still having no idea what and why goes wrong:(
>
> 2015-02-14 21:54 GMT+07:00 Kenneth Heafield <[email protected]>:
>
>> Sign my petition to add return code checking to train-model.perl.
>>
>> On 02/14/2015 09:33 AM, Tom Hoar wrote:
>> > An empty phrase-table.gz file is usually the result of an ill-prepared
>> > training corpus. Make sure you run the final corpus through
>> > clean-corpus-n.perl.
>> >
>> >
>> >
>> > On 02/14/2015 09:19 PM, Александр Паньшин wrote:
>> >> Hello, everybody!
>> >>
>> >> I have a problem with moses. I created big parallel corpus by
>> >> concatenating a bunch of existing corpuses on
>> >> http://opus.lingfil.uu.se. After that I cleaned up results (while
>> >> creating tokens script reported some errors. I deleted error-prone
>> >> rows from both of parts).
>> >>
>> >> Then I started to train translation model using mgiza with such an
>> >> executable:
>> >>
>> >> nohup nice /opt/moses/scripts/training/train-model.perl --parallel
>> >> -mgiza -mgiza-cpus 20 -cores 20 -root-dir train -corpus
>> >> ~/corpus/ru-en.clean -f ru -e en -alignment grow-diag-final-and
>> >> -reordering msd-bidirectional-fe -lm 0:3:$HOME/lm/ru-en.arpa.en:8
>> >> -external-bin-dir /opt/moses/mgiza >& training.out &
>> >>
>> >> After a week of work I have this in the end of training.out:
>> >> (7) learn reordering model @ Sun Feb  8 15:30:35 MSK 2015
>> >> (7.1) [no factors] learn reordering model @ Sun Feb  8 15:30:35 MSK
>> 2015
>> >> (7.2) building tables @ Sun Feb  8 15:30:35 MSK 2015
>> >> Executing: /opt/moses/scripts/../bin/lexical-reordering-score
>> >> /home/adminadmin/working/train/model/extract.o.sorted.gz 0.5
>> >> /home/adminadmin/working/train/model/reordering-table. --model "wbe
>> >> msd wbe-msd-bidirectional-fe"
>> >> Lexical Reordering Scorer
>> >> scores lexical reordering models of several types (hierarchical,
>> >> phrase-based and word-based-extraction
>> >> (8) learn generation model @ Sun Feb  8 15:30:35 MSK 2015
>> >>   no generation model requested, skipping step
>> >> (9) create moses.ini @ Sun Feb  8 15:30:35 MSK 2015
>> >>
>> >> There is a bunch of files in ~/working/train folder. Looks like
>> >> everything is ok, except the tiny problem: phrase-table.tgz has size
>> >> of 20 bytes. And, of course, it's not usable at all!
>> >>
>> >> Can somebody help and give me a direction where to dig?
>> >>
>> >>
>> >> _______________________________________________
>> >> Moses-support mailing list
>> >> [email protected]
>> >> http://mailman.mit.edu/mailman/listinfo/moses-support
>> >
>> >
>> >
>> > _______________________________________________
>> > Moses-support mailing list
>> > [email protected]
>> > http://mailman.mit.edu/mailman/listinfo/moses-support
>> >
>> _______________________________________________
>> Moses-support mailing list
>> [email protected]
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to