Hi, two possible solutions: * you replace all numbers with a NUMBER token (maybe different ones for different types of numbers, e.g. fractions, cardinals, years, ...) * you learn a translation model with these tokens (and hope that NUMBER typically gets aligned to NUMBER - or clean up the translation table accordingly) * when decoding you first replace the number with the NUMBER token, and then replace its translation NUMBER token with the number in mind.
or: you use the XML-market to force the translation of numbers. it is <xml translation="9"> 9 </xml> o' clock. -phi On Tue, Jul 6, 2010 at 11:36 AM, Raphael Payen <[email protected]> wrote: > Hi all > > Is there a way with moses to translate numbers by considering them as > all part of the number category, but keeping the surface form intact ? > > Let's say I have this in the learning corpus: > 99|number bottles|noun of|prep beer|noun > And I want to translate this: > 98|number bottles|noun of|prep beer|noun > > If I train a model only on POS tags, it will only recognize the > sequence "number noun prep noun" but not the surface forms. If I train > it either only on the surface forms or on the combination of form+POS, > it will not recognize that 98 can take the place of 99 ? > > The option I thought of is to replace numbers with just a "<number>" > tag and then translate the surface forms. But then, keeping trace of > which number it was is not very practical: I need to use a factor for > this, and I must add it to each word ? > > -- > Raphael Payen > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support > _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
