Hello,

We are trying to run factored training on spanish corpus. We first tag the
corpus with TreeTagger, change the format to "<word>|<lemma>|<tag>
<word>|<lemma>|<tag> ...", and then run the script
train-factored-phrase-model.perl on it. The problem arises when there are
phrases which are treated by TreeTagger as one word, for example
"a la derecha|a~la~derecha|adv". Then train-factored-phrase-model.perl says
that no factor was found for the word "a" and for the word "la" in the file.

Is there a way to tell the script that "a la derecha" should be treated as
one word ?

Thanks,
     Michael.
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to