Hi, XML markup may be used for reordering constraints, but that is different from treating a word sequence as a unit with respect to the translation model.
-phi On Fri, Jun 10, 2011 at 7:02 PM, Somayeh Bakhshaei <[email protected]>wrote: > Dear Dr. Koehn, > > for grouping the input tokens (like "the man"), isn't there any solution by > the help of XML tags? > > ------------------ > Best Regards, > S.Bakhshaei > > --- On *Fri, 6/10/11, [email protected] < > [email protected]>* wrote: > > > From: [email protected] <[email protected]> > Subject: Moses-support Digest, Vol 56, Issue 8 > To: [email protected] > Date: Friday, June 10, 2011, 8:43 PM > > Send Moses-support mailing list submissions to > [email protected] <http://mc/[email protected]> > > To subscribe or unsubscribe via the World Wide Web, visit > http://mailman.mit.edu/mailman/listinfo/moses-support > or, via email, send a message with subject or body 'help' to > > [email protected]<http://mc/[email protected]> > > You can reach the person managing the list at > > [email protected]<http://mc/[email protected]> > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Moses-support digest..." > > > Today's Topics: > > 1. How to change phrase representation (Anna c) > 2. Re: How to change phrase representation (Philipp Koehn) > 3. FW: How to change phrase representation (Anna c) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Fri, 10 Jun 2011 11:38:34 +0200 > From: Anna c <[email protected]<http://mc/[email protected]> > > > Subject: [Moses-support] How to change phrase representation > To: <[email protected] <http://mc/[email protected]>> > Message-ID: > <[email protected]<http://mc/[email protected]> > > > Content-Type: text/plain; charset="iso-8859-1" > > > Hi! > I'm doing a master's degree and I need some help with one of my subjects. > I've already installed GIZA++ and Moses correctly, and made the step by step > guide of the web, checking that everything was ok. But I'm a newbie in this > and I'm a bit lost. What I have to do is to change the representation so the > basic unit won't be the word, but pairs or triplets of words, and compare it > with the normal representation. How do I do that? Do I have to change the > preparation step in the training? > > Thank you very much! > Best regards, > Anna > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > http://mailman.mit.edu/mailman/private/moses-support/attachments/20110610/2429053c/attachment-0001.htm > > ------------------------------ > > Message: 2 > Date: Fri, 10 Jun 2011 10:48:07 +0100 > From: Philipp Koehn > <[email protected]<http://mc/[email protected]> > > > Subject: Re: [Moses-support] How to change phrase representation > To: Anna c <[email protected]<http://mc/[email protected]> > > > Cc: [email protected] <http://mc/[email protected]> > Message-ID: > <[email protected]<http://mc/[email protected]> > > > Content-Type: text/plain; charset=ISO-8859-1 > > Hi, > > I am not entirely sure if I fully understand your question, > but let me try to answer. > > the phrase-based model implementation considers tokens > separated by a white space as a word. It does also learn > translation entries for sequences of words ("phrases"). > > If you want to group words into larger tokens, then you > have to replace the white spaces. > > For instance, if you want to force the training setup and decoder > to treat "the man" as a unit, then you should replace all > occurrences (in training data and decoder input) with "the~man". > > -phi > > On Fri, Jun 10, 2011 at 10:38 AM, Anna c > <[email protected]<http://mc/[email protected]>> > wrote: > > Hi! > > I'm doing a master's degree and I need some help with one of my subjects. > > I've already installed GIZA++ and Moses correctly, and made the step by > step > > guide of the web, checking that everything was ok. But I'm a newbie in > this > > and I'm a bit lost. What I have to do is to change the representation so > the > > basic unit won't be the word, but pairs or triplets of words, and compare > it > > with the normal representation. How do I do that? Do I have to change the > > preparation step in the training? > > > > Thank you very much! > > Best regards, > > Anna > > > > _______________________________________________ > > Moses-support mailing list > > [email protected] <http://mc/[email protected]> > > http://mailman.mit.edu/mailman/listinfo/moses-support > > > > > > > ------------------------------ > > Message: 3 > Date: Fri, 10 Jun 2011 17:37:39 +0200 > From: Anna c <[email protected]<http://mc/[email protected]> > > > Subject: [Moses-support] FW: How to change phrase representation > To: <[email protected] <http://mc/[email protected]>>, < > [email protected] <http://mc/[email protected]>> > Message-ID: > <[email protected]<http://mc/[email protected]> > > > Content-Type: text/plain; charset="iso-8859-1" > > > I think it would be that. I'm gonna try it. Thank you very much! > > And, if it's not too much trouble, I've another question..... I only have > two sets, training and test (which I've split into four: training.es, > training.en, test.es, test.en, as the originals had both languages in the > same line). The training part hasn't got any problem, but as I see on the > guide, I must use different sets in tuning (in the example, dev/ > nc-dev2007.fr or dev/nc-dev2007.en) and evaluation (devtest/nc-test2007.fr, > nc-test2007-ref.en.sgm, nc-test2007-src.fr.sgm). Should I use the same set > in all the steps? I mean, when the example uses a file .fr, I use my > test.es and when is .en, my test.en. Or should I use a different part of > it in each step? > > Again, thank you very much! > Anna > > > > Date: Fri, 10 Jun 2011 10:48:07 +0100 > > Subject: Re: [Moses-support] How to change phrase representation > > From: [email protected] <http://mc/[email protected]> > > To: [email protected] <http://mc/[email protected]> > > CC: [email protected] <http://mc/[email protected]> > > > > Hi, > > > > I am not entirely sure if I fully understand your question, > > but let me try to answer. > > > > the phrase-based model implementation considers tokens > > separated by a white space as a word. It does also learn > > translation entries for sequences of words ("phrases"). > > > > If you want to group words into larger tokens, then you > > have to replace the white spaces. > > > > For instance, if you want to force the training setup and decoder > > to treat "the man" as a unit, then you should replace all > > occurrences (in training data and decoder input) with "the~man". > > > > -phi > > > > On Fri, Jun 10, 2011 at 10:38 AM, Anna c > > <[email protected]<http://mc/[email protected]>> > wrote: > > > Hi! > > > I'm doing a master's degree and I need some help with one of my > subjects. > > > I've already installed GIZA++ and Moses correctly, and made the step by > step > > > guide of the web, checking that everything was ok. But I'm a newbie in > this > > > and I'm a bit lost. What I have to do is to change the representation > so the > > > basic unit won't be the word, but pairs or triplets of words, and > compare it > > > with the normal representation. How do I do that? Do I have to change > the > > > preparation step in the training? > > > > > > Thank you very much! > > > Best regards, > > > Anna > > > > > > _______________________________________________ > > > Moses-support mailing list > > > [email protected] <http://mc/[email protected]> > > > http://mailman.mit.edu/mailman/listinfo/moses-support > > > > > > > > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > http://mailman.mit.edu/mailman/private/moses-support/attachments/20110610/34db6400/attachment-0001.htm > > ------------------------------ > > _______________________________________________ > Moses-support mailing list > [email protected] <http://mc/[email protected]> > http://mailman.mit.edu/mailman/listinfo/moses-support > > > End of Moses-support Digest, Vol 56, Issue 8 > ******************************************** > > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support > >
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
