Thank you for your answer. However I still don't understand something. If
there is such a phrase in the input to translate, then moses will not know
that it is equal to the phrase with tildes.

Michael.

On Thu, Feb 12, 2009 at 3:28 PM, Philipp Koehn <[email protected]> wrote:

> Hi,
>
> one thing you can do here is to change the tokenization scheme based
> on the treetagger output, i.e. make a~la~derecha one word (using the
> tildes, for instance to glue the parts together).
>
> -phi
>
> On Thu, Feb 12, 2009 at 1:10 PM, Michael Zuckerman
> <[email protected]> wrote:
> > Hello,
> >
> > We are trying to run factored training on spanish corpus. We first tag
> the
> > corpus with TreeTagger, change the format to "<word>|<lemma>|<tag>
> > <word>|<lemma>|<tag> ...", and then run the script
> > train-factored-phrase-model.perl on it. The problem arises when there are
> > phrases which are treated by TreeTagger as one word, for example
> > "a la derecha|a~la~derecha|adv". Then train-factored-phrase-model.perl
> says
> > that no factor was found for the word "a" and for the word "la" in the
> file.
> > Is there a way to tell the script that "a la derecha" should be treated
> as
> > one word ?
> >
> > Thanks,
> >      Michael.
> >
> > _______________________________________________
> > Moses-support mailing list
> > [email protected]
> > http://mailman.mit.edu/mailman/listinfo/moses-support
> >
> >
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to