Re: [Moses-support] FW: How to change phrase representation

Kevin Gimpel Tue, 14 Jun 2011 10:52:24 -0700

In case no one has responded to your question yet:
Yes, you should use different data sets for tuning and evaluation.  If you
only have one test set, you could split it in half and use one half for
tuning and the other half for evaluation, as long as it's large enough.  You
want to have at least about 1000 sentence pairs for tuning, preferably 2000
or so, and similarly for evaluation.
Kevin


On Fri, Jun 10, 2011 at 11:37 AM, Anna c <[email protected]> wrote:

>   I think it would be that. I'm gonna try it. Thank you very much!
>
> And, if it's not too much trouble, I've another question..... I only have
> two sets, training and test (which I've split into four: training.es,
> training.en, test.es, test.en, as the originals had both languages in the
> same line). The training part hasn't got any problem, but as I see on the
> guide, I must use different sets in tuning (in the example, dev/
> nc-dev2007.fr or dev/nc-dev2007.en) and evaluation (devtest/nc-test2007.fr,
> nc-test2007-ref.en.sgm, nc-test2007-src.fr.sgm). Should I use the same set
> in all the steps? I mean, when the example uses a file .fr, I use my
> test.es and when is .en, my test.en. Or should I use a different part of
> it in each step?
>
> Again, thank you very much!
> Anna
>
>
> > Date: Fri, 10 Jun 2011 10:48:07 +0100
> > Subject: Re: [Moses-support] How to change phrase representation
> > From: [email protected]
> > To: [email protected]
> > CC: [email protected]
>
> >
> > Hi,
> >
> > I am not entirely sure if I fully understand your question,
> > but let me try to answer.
> >
> > the phrase-based model implementation considers tokens
> > separated by a white space as a word. It does also learn
> > translation entries for sequences of words ("phrases").
> >
> > If you want to group words into larger tokens, then you
> > have to replace the white spaces.
> >
> > For instance, if you want to force the training setup and decoder
> > to treat "the man" as a unit, then you should replace all
> > occurrences (in training data and decoder input) with "the~man".
> >
> > -phi
> >
> > On Fri, Jun 10, 2011 at 10:38 AM, Anna c <[email protected]> wrote:
> > > Hi!
> > > I'm doing a master's degree and I need some help with one of my
> subjects.
> > > I've already installed GIZA++ and Moses correctly, and made the step by
> step
> > > guide of the web, checking that everything was ok. But I'm a newbie in
> this
> > > and I'm a bit lost. What I have to do is to change the representation
> so the
> > > basic unit won't be the word, but pairs or triplets of words, and
> compare it
> > > with the normal representation. How do I do that? Do I have to change
> the
> > > preparation step in the training?
> > >
> > > Thank you very much!
> > > Best regards,
> > > Anna
> > >
> > > _______________________________________________
> > > Moses-support mailing list
> > > [email protected]
> > > http://mailman.mit.edu/mailman/listinfo/moses-support
> > >
> > >
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] FW: How to change phrase representation

Reply via email to