Thank you for your answer. However I still don't understand something. If there is such a phrase in the input to translate, then moses will not know that it is equal to the phrase with tildes.
Michael. On Thu, Feb 12, 2009 at 3:28 PM, Philipp Koehn <[email protected]> wrote: > Hi, > > one thing you can do here is to change the tokenization scheme based > on the treetagger output, i.e. make a~la~derecha one word (using the > tildes, for instance to glue the parts together). > > -phi > > On Thu, Feb 12, 2009 at 1:10 PM, Michael Zuckerman > <[email protected]> wrote: > > Hello, > > > > We are trying to run factored training on spanish corpus. We first tag > the > > corpus with TreeTagger, change the format to "<word>|<lemma>|<tag> > > <word>|<lemma>|<tag> ...", and then run the script > > train-factored-phrase-model.perl on it. The problem arises when there are > > phrases which are treated by TreeTagger as one word, for example > > "a la derecha|a~la~derecha|adv". Then train-factored-phrase-model.perl > says > > that no factor was found for the word "a" and for the word "la" in the > file. > > Is there a way to tell the script that "a la derecha" should be treated > as > > one word ? > > > > Thanks, > > Michael. > > > > _______________________________________________ > > Moses-support mailing list > > [email protected] > > http://mailman.mit.edu/mailman/listinfo/moses-support > > > > >
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
