Hi Philey

Is it possible that source and target got switched somehow in the training 
process? Or perhaps a line got removed from one of the corpus or alignment 
files?

Have a look at one of the offending lines in the corpus files, and in the giza 
source->target and target->source model files to see if you can spot the 
problem. If you can't see anything wrong, then post the lines here.

(The giza alignment files have names like en-es.A3.final.gz)

cheers - Barry

On Wednesday 07 Mar 2012 19:43:01 Feifan Liu wrote:
> Thanks Joerg!
> I tried the regex you provided, and it still didn't fix the problem. Any
> other symbols matter? like special "|"? (although I don't know why "|" is
> special).
> I also found when I swap the source and target, there is no such a problem,
> only showing three "Sentence mismatch" errors. Previously Sentence mismatch
> also existed.
> -Philley
> 
> On Wed, Mar 7, 2012 at 12:47 PM, Joerg Tiedemann <
> 
> [email protected]> wrote:
> > Maybe you have some control characters in your data or Moses-reserved
> > words?
> > The following regex-substitutions (Perl style) may help you:
> >
> >    s/[\x00-\x1f\x7f\n]//gs;
> >    s/\<(s|unk|\/s|\s*and\s*|)\>//gs;
> >    s/\[\s*and\s*\]//gs;
> >    s/\|/_/gs;
> >
> > Jörg
> >
> > On Wed, Mar 7, 2012 at 7:23 PM, Feifan Liu <[email protected]> wrote:
> > > I checked the data once again, still can't figure out the reason. Any
> >
> > help
> >
> > > will be appreciated.
> > >
> > >
> > > On Tue, Mar 6, 2012 at 11:55 PM, Feifan Liu <[email protected]>
> >
> > wrote:
> > >> Hi All,
> > >> I run into the problem of "out of bounds" when training a model using
> > >> train.perl.
> > >>
> > >> The warning information is:
> > >> WARNING: sentence 2176 has alignment point (3, 3) out of bounds (6, 3)
> > >> T: z z z
> > >> S: s l iy p ih nx
> > >>
> > >> Every sentence pair has this warning. E.g. for this sentence, there
> > >> are three letters in T, but the alignment point to the 4th(index of
> > >> 3). I checked previous email archive, didn't find the solution. In the
> > >> data, there is no empty line, no "|" symbol.
> > >> Much appreciated if any solutions can be suggested.
> > >> -philley
> > >
> > > _______________________________________________
> > > Moses-support mailing list
> > > [email protected]
> > > http://mailman.mit.edu/mailman/listinfo/moses-support
> >
> > --
> >
> > *************************************************************************
> >********* Jörg Tiedemann
> > [email protected]
> >  Dep. of Linguistics and Philology
> > http://stp.lingfil.uu.se/~joerg/
> >  Uppsala University                                  tel:  +46 (0)18 -
> > 471 1412
> >  Box 635, SE-751 26 Uppsala/SWEDEN    fax: +46 (0)18 - 471 1094
> 

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to