Hi Barry, Thanks so much to bring the giza model file to my attention! Although the file is fine, I found out the problem and got it fixed. I am posting the explanation here just in case other people will run into this as well.
For the past two days, I refined some parallel data, i.e. remove some lines that are not well matched. I found the giza source-target and target-source file was never changed although I retrained the model many times. That is why the giza information is not matched to the new training data. I just remove all the generated files and retrain the model and it was fixed. Thanks very much, Barry and Joerg! -Philley On Wed, Mar 7, 2012 at 3:08 PM, Barry Haddow <[email protected]>wrote: > Hi Philey > > Is it possible that source and target got switched somehow in the training > process? Or perhaps a line got removed from one of the corpus or alignment > files? > > Have a look at one of the offending lines in the corpus files, and in the > giza > source->target and target->source model files to see if you can spot the > problem. If you can't see anything wrong, then post the lines here. > > (The giza alignment files have names like en-es.A3.final.gz) > > cheers - Barry > > On Wednesday 07 Mar 2012 19:43:01 Feifan Liu wrote: > > Thanks Joerg! > > I tried the regex you provided, and it still didn't fix the problem. Any > > other symbols matter? like special "|"? (although I don't know why "|" is > > special). > > I also found when I swap the source and target, there is no such a > problem, > > only showing three "Sentence mismatch" errors. Previously Sentence > mismatch > > also existed. > > -Philley > > > > On Wed, Mar 7, 2012 at 12:47 PM, Joerg Tiedemann < > > > > [email protected]> wrote: > > > Maybe you have some control characters in your data or Moses-reserved > > > words? > > > The following regex-substitutions (Perl style) may help you: > > > > > > s/[\x00-\x1f\x7f\n]//gs; > > > s/\<(s|unk|\/s|\s*and\s*|)\>//gs; > > > s/\[\s*and\s*\]//gs; > > > s/\|/_/gs; > > > > > > Jörg > > > > > > On Wed, Mar 7, 2012 at 7:23 PM, Feifan Liu <[email protected]> > wrote: > > > > I checked the data once again, still can't figure out the reason. Any > > > > > > help > > > > > > > will be appreciated. > > > > > > > > > > > > On Tue, Mar 6, 2012 at 11:55 PM, Feifan Liu <[email protected]> > > > > > > wrote: > > > >> Hi All, > > > >> I run into the problem of "out of bounds" when training a model > using > > > >> train.perl. > > > >> > > > >> The warning information is: > > > >> WARNING: sentence 2176 has alignment point (3, 3) out of bounds (6, > 3) > > > >> T: z z z > > > >> S: s l iy p ih nx > > > >> > > > >> Every sentence pair has this warning. E.g. for this sentence, there > > > >> are three letters in T, but the alignment point to the 4th(index of > > > >> 3). I checked previous email archive, didn't find the solution. In > the > > > >> data, there is no empty line, no "|" symbol. > > > >> Much appreciated if any solutions can be suggested. > > > >> -philley > > > > > > > > _______________________________________________ > > > > Moses-support mailing list > > > > [email protected] > > > > http://mailman.mit.edu/mailman/listinfo/moses-support > > > > > > -- > > > > > > > ************************************************************************* > > >********* Jörg Tiedemann > > > [email protected] > > > Dep. of Linguistics and Philology > > > http://stp.lingfil.uu.se/~joerg/ > > > Uppsala University tel: +46 (0)18 - > > > 471 1412 > > > Box 635, SE-751 26 Uppsala/SWEDEN fax: +46 (0)18 - 471 1094 > > >
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
