Hi Barry,

Thanks so much to bring the giza model file to my attention!
Although the file is fine, I found out the problem and got it fixed. I am
posting the explanation here just in case other people will run into this
as well.

For the past two days, I refined some parallel data, i.e. remove some lines
that are not well matched. I found the giza source-target and target-source
file was never changed although I retrained the model many times. That is
why the giza information is not matched to the new training data. I just
remove all the generated files and retrain the model and it was fixed.

Thanks very much, Barry and Joerg!
-Philley

On Wed, Mar 7, 2012 at 3:08 PM, Barry Haddow <[email protected]>wrote:

> Hi Philey
>
> Is it possible that source and target got switched somehow in the training
> process? Or perhaps a line got removed from one of the corpus or alignment
> files?
>
> Have a look at one of the offending lines in the corpus files, and in the
> giza
> source->target and target->source model files to see if you can spot the
> problem. If you can't see anything wrong, then post the lines here.
>
> (The giza alignment files have names like en-es.A3.final.gz)
>
> cheers - Barry
>
> On Wednesday 07 Mar 2012 19:43:01 Feifan Liu wrote:
> > Thanks Joerg!
> > I tried the regex you provided, and it still didn't fix the problem. Any
> > other symbols matter? like special "|"? (although I don't know why "|" is
> > special).
> > I also found when I swap the source and target, there is no such a
> problem,
> > only showing three "Sentence mismatch" errors. Previously Sentence
> mismatch
> > also existed.
> > -Philley
> >
> > On Wed, Mar 7, 2012 at 12:47 PM, Joerg Tiedemann <
> >
> > [email protected]> wrote:
> > > Maybe you have some control characters in your data or Moses-reserved
> > > words?
> > > The following regex-substitutions (Perl style) may help you:
> > >
> > >    s/[\x00-\x1f\x7f\n]//gs;
> > >    s/\<(s|unk|\/s|\s*and\s*|)\>//gs;
> > >    s/\[\s*and\s*\]//gs;
> > >    s/\|/_/gs;
> > >
> > > Jörg
> > >
> > > On Wed, Mar 7, 2012 at 7:23 PM, Feifan Liu <[email protected]>
> wrote:
> > > > I checked the data once again, still can't figure out the reason. Any
> > >
> > > help
> > >
> > > > will be appreciated.
> > > >
> > > >
> > > > On Tue, Mar 6, 2012 at 11:55 PM, Feifan Liu <[email protected]>
> > >
> > > wrote:
> > > >> Hi All,
> > > >> I run into the problem of "out of bounds" when training a model
> using
> > > >> train.perl.
> > > >>
> > > >> The warning information is:
> > > >> WARNING: sentence 2176 has alignment point (3, 3) out of bounds (6,
> 3)
> > > >> T: z z z
> > > >> S: s l iy p ih nx
> > > >>
> > > >> Every sentence pair has this warning. E.g. for this sentence, there
> > > >> are three letters in T, but the alignment point to the 4th(index of
> > > >> 3). I checked previous email archive, didn't find the solution. In
> the
> > > >> data, there is no empty line, no "|" symbol.
> > > >> Much appreciated if any solutions can be suggested.
> > > >> -philley
> > > >
> > > > _______________________________________________
> > > > Moses-support mailing list
> > > > [email protected]
> > > > http://mailman.mit.edu/mailman/listinfo/moses-support
> > >
> > > --
> > >
> > >
> *************************************************************************
> > >********* Jörg Tiedemann
> > > [email protected]
> > >  Dep. of Linguistics and Philology
> > > http://stp.lingfil.uu.se/~joerg/
> > >  Uppsala University                                  tel:  +46 (0)18 -
> > > 471 1412
> > >  Box 635, SE-751 26 Uppsala/SWEDEN    fax: +46 (0)18 - 471 1094
> >
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to