hi Raphael
using just the POS sequence was done in our paper on 'Factored Templates'
http://www.aclweb.org/anthology/E/E09/E09-1043.pdf
however, as you point out, you need to have seen every number during
training otherwise it would be treated as OOV word during decoding.
Hieu
On 06/07/2010 11:36, Raphael Payen wrote:
> Hi all
>
> Is there a way with moses to translate numbers by considering them as
> all part of the number category, but keeping the surface form intact ?
>
> Let's say I have this in the learning corpus:
> 99|number bottles|noun of|prep beer|noun
> And I want to translate this:
> 98|number bottles|noun of|prep beer|noun
>
> If I train a model only on POS tags, it will only recognize the
> sequence "number noun prep noun" but not the surface forms. If I train
> it either only on the surface forms or on the combination of form+POS,
> it will not recognize that 98 can take the place of 99 ?
>
> The option I thought of is to replace numbers with just a "<number>"
> tag and then translate the surface forms. But then, keeping trace of
> which number it was is not very practical: I need to use a factor for
> this, and I must add it to each word ?
>
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support