Hi,
I am not entirely sure if I fully understand your question,
but let me try to answer.
the phrase-based model implementation considers tokens
separated by a white space as a word. It does also learn
translation entries for sequences of words ("phrases").
If you want to group words into larger tokens, then you
have to replace the white spaces.
For instance, if you want to force the training setup and decoder
to treat "the man" as a unit, then you should replace all
occurrences (in training data and decoder input) with "the~man".
-phi
On Fri, Jun 10, 2011 at 10:38 AM, Anna c <[email protected]> wrote:
> Hi!
> I'm doing a master's degree and I need some help with one of my subjects.
> I've already installed GIZA++ and Moses correctly, and made the step by step
> guide of the web, checking that everything was ok. But I'm a newbie in this
> and I'm a bit lost. What I have to do is to change the representation so the
> basic unit won't be the word, but pairs or triplets of words, and compare it
> with the normal representation. How do I do that? Do I have to change the
> preparation step in the training?
>
> Thank you very much!
> Best regards,
> Anna
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support