Hi,

two possible solutions:
* you replace all numbers with a NUMBER token (maybe different ones
for different
  types of numbers, e.g. fractions, cardinals, years, ...)
* you learn a translation model with these tokens (and hope that
NUMBER typically
  gets aligned to NUMBER - or clean up the translation table accordingly)
* when decoding you first replace the number with the NUMBER token, and then
  replace its translation NUMBER token with the number in mind.

or: you use the XML-market to force the translation of numbers.
  it is <xml translation="9"> 9 </xml> o' clock.

-phi

On Tue, Jul 6, 2010 at 11:36 AM, Raphael Payen <[email protected]> wrote:
> Hi all
>
> Is there a way with moses to translate numbers by considering them as
> all part of the number category, but keeping the surface form intact ?
>
> Let's say I have this in the learning corpus:
> 99|number bottles|noun of|prep beer|noun
> And I want to translate this:
> 98|number bottles|noun of|prep beer|noun
>
> If I train a model only on POS tags, it will only recognize the
> sequence "number noun prep noun" but not the surface forms. If I train
> it either only on the surface forms or on the combination of form+POS,
> it will not recognize that 98 can take the place of 99 ?
>
> The option I thought of is to replace numbers with just a "<number>"
> tag and then translate the surface forms. But then, keeping trace of
> which number it was is not very practical: I need to use a factor for
> this, and I must add it to each word ?
>
> --
> Raphael Payen
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to