Hi, another and maybe cleaner fix is to run TreeTagger and all your preprocessing steps on the input text as well, not just the training data.
O. Philipp Koehn wrote: > Hi, > > yes that is true - it requires that your input is segmented in the same > way. This may or may not be a problem. One work-around would be > to use the lattice format to provide multiple segmentations. > > -phi > > On Thu, Feb 12, 2009 at 1:48 PM, Michael Zuckerman > <[email protected]> wrote: >> Thank you for your answer. However I still don't understand something. If >> there is such a phrase in the input to translate, then moses will not know >> that it is equal to the phrase with tildes. >> >> Michael. >> >> On Thu, Feb 12, 2009 at 3:28 PM, Philipp Koehn <[email protected]> wrote: >>> Hi, >>> >>> one thing you can do here is to change the tokenization scheme based >>> on the treetagger output, i.e. make a~la~derecha one word (using the >>> tildes, for instance to glue the parts together). >>> >>> -phi >>> >>> On Thu, Feb 12, 2009 at 1:10 PM, Michael Zuckerman >>> <[email protected]> wrote: >>>> Hello, >>>> >>>> We are trying to run factored training on spanish corpus. We first tag >>>> the >>>> corpus with TreeTagger, change the format to "<word>|<lemma>|<tag> >>>> <word>|<lemma>|<tag> ...", and then run the script >>>> train-factored-phrase-model.perl on it. The problem arises when there >>>> are >>>> phrases which are treated by TreeTagger as one word, for example >>>> "a la derecha|a~la~derecha|adv". Then train-factored-phrase-model.perl >>>> says >>>> that no factor was found for the word "a" and for the word "la" in the >>>> file. >>>> Is there a way to tell the script that "a la derecha" should be treated >>>> as >>>> one word ? >>>> >>>> Thanks, >>>> Michael. >>>> >>>> _______________________________________________ >>>> Moses-support mailing list >>>> [email protected] >>>> http://mailman.mit.edu/mailman/listinfo/moses-support >>>> >>>> >> > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support > -- Ondrej Bojar (mailto:[email protected]) http://www.cuni.cz/~obo _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
