Hi, yes that is true - it requires that your input is segmented in the same way. This may or may not be a problem. One work-around would be to use the lattice format to provide multiple segmentations.
-phi On Thu, Feb 12, 2009 at 1:48 PM, Michael Zuckerman <[email protected]> wrote: > Thank you for your answer. However I still don't understand something. If > there is such a phrase in the input to translate, then moses will not know > that it is equal to the phrase with tildes. > > Michael. > > On Thu, Feb 12, 2009 at 3:28 PM, Philipp Koehn <[email protected]> wrote: >> >> Hi, >> >> one thing you can do here is to change the tokenization scheme based >> on the treetagger output, i.e. make a~la~derecha one word (using the >> tildes, for instance to glue the parts together). >> >> -phi >> >> On Thu, Feb 12, 2009 at 1:10 PM, Michael Zuckerman >> <[email protected]> wrote: >> > Hello, >> > >> > We are trying to run factored training on spanish corpus. We first tag >> > the >> > corpus with TreeTagger, change the format to "<word>|<lemma>|<tag> >> > <word>|<lemma>|<tag> ...", and then run the script >> > train-factored-phrase-model.perl on it. The problem arises when there >> > are >> > phrases which are treated by TreeTagger as one word, for example >> > "a la derecha|a~la~derecha|adv". Then train-factored-phrase-model.perl >> > says >> > that no factor was found for the word "a" and for the word "la" in the >> > file. >> > Is there a way to tell the script that "a la derecha" should be treated >> > as >> > one word ? >> > >> > Thanks, >> > Michael. >> > >> > _______________________________________________ >> > Moses-support mailing list >> > [email protected] >> > http://mailman.mit.edu/mailman/listinfo/moses-support >> > >> > > > _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
