Hi Philey Is it possible that source and target got switched somehow in the training process? Or perhaps a line got removed from one of the corpus or alignment files?
Have a look at one of the offending lines in the corpus files, and in the giza source->target and target->source model files to see if you can spot the problem. If you can't see anything wrong, then post the lines here. (The giza alignment files have names like en-es.A3.final.gz) cheers - Barry On Wednesday 07 Mar 2012 19:43:01 Feifan Liu wrote: > Thanks Joerg! > I tried the regex you provided, and it still didn't fix the problem. Any > other symbols matter? like special "|"? (although I don't know why "|" is > special). > I also found when I swap the source and target, there is no such a problem, > only showing three "Sentence mismatch" errors. Previously Sentence mismatch > also existed. > -Philley > > On Wed, Mar 7, 2012 at 12:47 PM, Joerg Tiedemann < > > [email protected]> wrote: > > Maybe you have some control characters in your data or Moses-reserved > > words? > > The following regex-substitutions (Perl style) may help you: > > > > s/[\x00-\x1f\x7f\n]//gs; > > s/\<(s|unk|\/s|\s*and\s*|)\>//gs; > > s/\[\s*and\s*\]//gs; > > s/\|/_/gs; > > > > Jörg > > > > On Wed, Mar 7, 2012 at 7:23 PM, Feifan Liu <[email protected]> wrote: > > > I checked the data once again, still can't figure out the reason. Any > > > > help > > > > > will be appreciated. > > > > > > > > > On Tue, Mar 6, 2012 at 11:55 PM, Feifan Liu <[email protected]> > > > > wrote: > > >> Hi All, > > >> I run into the problem of "out of bounds" when training a model using > > >> train.perl. > > >> > > >> The warning information is: > > >> WARNING: sentence 2176 has alignment point (3, 3) out of bounds (6, 3) > > >> T: z z z > > >> S: s l iy p ih nx > > >> > > >> Every sentence pair has this warning. E.g. for this sentence, there > > >> are three letters in T, but the alignment point to the 4th(index of > > >> 3). I checked previous email archive, didn't find the solution. In the > > >> data, there is no empty line, no "|" symbol. > > >> Much appreciated if any solutions can be suggested. > > >> -philley > > > > > > _______________________________________________ > > > Moses-support mailing list > > > [email protected] > > > http://mailman.mit.edu/mailman/listinfo/moses-support > > > > -- > > > > ************************************************************************* > >********* Jörg Tiedemann > > [email protected] > > Dep. of Linguistics and Philology > > http://stp.lingfil.uu.se/~joerg/ > > Uppsala University tel: +46 (0)18 - > > 471 1412 > > Box 635, SE-751 26 Uppsala/SWEDEN fax: +46 (0)18 - 471 1094 > _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
