Dear Dr. Koehn,

for grouping the input tokens (like "the man"), isn't there any solution by the 
help of XML tags?

------------------

Best Regards,

S.Bakhshaei

--- On Fri, 6/10/11, [email protected] 
<[email protected]> wrote:

From: [email protected] <[email protected]>
Subject: Moses-support Digest, Vol 56, Issue 8
To: [email protected]
Date: Friday, June 10, 2011, 8:43 PM

Send Moses-support mailing list submissions to
    [email protected]

To subscribe or unsubscribe via the World Wide Web, visit
    http://mailman.mit.edu/mailman/listinfo/moses-support
or, via email, send a message with subject or body 'help' to
    [email protected]

You can reach the person managing the list at
    [email protected]

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Moses-support digest..."


Today's Topics:

   1. How to change phrase representation (Anna c)
   2. Re: How to change phrase representation (Philipp Koehn)
   3. FW:  How to change phrase representation (Anna c)


----------------------------------------------------------------------

Message: 1
Date: Fri, 10 Jun 2011 11:38:34 +0200
From: Anna c <[email protected]>
Subject: [Moses-support] How to change phrase representation
To: <[email protected]>
Message-ID: <[email protected]>
Content-Type: text/plain; charset="iso-8859-1"


Hi!
I'm doing a master's degree and I need some help with one of my subjects. I've 
already installed GIZA++ and Moses correctly, and made the step by step guide 
of the web, checking that everything was ok. But I'm a newbie in this and I'm a 
bit lost. What I have to do is to change the representation so the basic unit 
won't be the word, but pairs or triplets of words, and compare it with the 
normal representation. How do I do that? Do I have to change the preparation 
step in the training?

Thank you very much!
Best regards,
Anna
                           
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
http://mailman.mit.edu/mailman/private/moses-support/attachments/20110610/2429053c/attachment-0001.htm

------------------------------

Message: 2
Date: Fri, 10 Jun 2011 10:48:07 +0100
From: Philipp Koehn <[email protected]>
Subject: Re: [Moses-support] How to change phrase representation
To: Anna c <[email protected]>
Cc: [email protected]
Message-ID: <[email protected]>
Content-Type: text/plain; charset=ISO-8859-1

Hi,

I am not entirely sure if I fully understand your question,
but let me try to answer.

the phrase-based model implementation considers tokens
separated by a white space as a word. It does also learn
translation entries for sequences of words ("phrases").

If you want to group words into larger tokens, then you
have to replace the white spaces.

For instance, if you want to force the training setup and decoder
to treat "the man" as a unit, then you should replace all
occurrences (in training data and decoder input) with "the~man".

-phi

On Fri, Jun 10, 2011 at 10:38 AM, Anna c <[email protected]> wrote:
> Hi!
> I'm doing a master's degree and I need some help with one of my subjects.
> I've already installed GIZA++ and Moses correctly, and made the step by step
> guide of the web, checking that everything was ok. But I'm a newbie in this
> and I'm a bit lost. What I have to do is to change the representation so the
> basic unit won't be the word, but pairs or triplets of words, and compare it
> with the normal representation. How do I do that? Do I have to change the
> preparation step in the training?
>
> Thank you very much!
> Best regards,
> Anna
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>


------------------------------

Message: 3
Date: Fri, 10 Jun 2011 17:37:39 +0200
From: Anna c <[email protected]>
Subject: [Moses-support] FW:  How to change phrase representation
To: <[email protected]>, <[email protected]>
Message-ID: <[email protected]>
Content-Type: text/plain; charset="iso-8859-1"


 I think it would be that. I'm gonna try it. Thank you very much!

And, if it's not too much trouble, I've another question..... I only have two 
sets, training and test (which I've split into four: training.es, training.en, 
test.es, test.en, as the originals had both languages in the same line). The 
training part hasn't got any problem, but as I see on the guide, I must use 
different sets in tuning (in the example, dev/nc-dev2007.fr or 
dev/nc-dev2007.en) and evaluation (devtest/nc-test2007.fr, 
nc-test2007-ref.en.sgm, nc-test2007-src.fr.sgm). Should I use the same set in 
all the steps? I mean, when the example uses a file .fr, I use my test.es and 
when is .en, my test.en. Or should I use a different part of it in each step?

Again, thank you very much!
Anna


> Date: Fri, 10 Jun 2011 10:48:07 +0100
> Subject: Re: [Moses-support] How to change phrase representation
> From: [email protected]
> To: [email protected]
> CC: [email protected]
> 
> Hi,
> 
> I am not entirely sure if I fully understand your question,
> but let me try to answer.
> 
> the phrase-based model implementation considers tokens
> separated by a white space as a word. It does also learn
> translation entries for sequences of words ("phrases").
> 
> If you want to group words into larger tokens, then you
> have to replace the white spaces.
> 
> For instance, if you want to force the training setup and decoder
> to treat "the man" as a unit, then you should replace all
> occurrences (in training data and decoder input) with "the~man".
> 
> -phi
> 
> On Fri, Jun 10, 2011 at 10:38 AM, Anna c <[email protected]> wrote:
> > Hi!
> > I'm doing a master's degree and I need some help with one of my subjects.
> > I've already installed GIZA++ and Moses correctly, and made the step by step
> > guide of the web, checking that everything was ok. But I'm a newbie in this
> > and I'm a bit lost. What I have to do is to change the representation so the
> > basic unit won't be the word, but pairs or triplets of words, and compare it
> > with the normal representation. How do I do that? Do I have to change the
> > preparation step in the training?
> >
> > Thank you very much!
> > Best regards,
> > Anna
> >
> > _______________________________________________
> > Moses-support mailing list
> > [email protected]
> > http://mailman.mit.edu/mailman/listinfo/moses-support
> >
> >
                           
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
http://mailman.mit.edu/mailman/private/moses-support/attachments/20110610/34db6400/attachment-0001.htm

------------------------------

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support


End of Moses-support Digest, Vol 56, Issue 8
********************************************
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to