there's some gunk in the training data eg. non-printing characters, trailing/prefix spaces, double spaces/tabs, non-utf8. You'll find out as soon as you look at the sentences it refers to
Hieu Hoang http://moses-smt.org/ On 30 December 2016 at 16:25, Mike Ladwig <[email protected]> wrote: > On Wed, Dec 28, 2016 at 4:37 AM, Hieu Hoang <[email protected]> wrote: > >> I am getting significantly (~20%) lower bleu scores than with 2.x but I >>> have a lot of testing before I will know why. >>> >> Moses and Moses2 should give very similar results. Please let me know >> what you find >> > > In looking at training logs, I am getting many messages like this: > > WARNING: sentence 540930 has alignment point (4, 3) out of bounds (4, 4) > T: europe is changing . > S: europa verandert sich . > WARNING: sentence 540931 has alignment point (9, 5) out of bounds (9, 10) > T: that was the slogan of the last european elections . > S: das war das motto der letzten europa wahlen . > WARNING: sentence 540932 has alignment point (6, 0) out of bounds (6, 6) > T: personally , i am convinced . > S: personlich stimme ich dem zu . > > Thoughts? > mike. > >
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
