Hi everyone,
So just to get this right, is it possible at the moment to add such index 
values 
to tokens, so that the token is used to translate  ignoring the index value, 
but 
still produce this value in the translation?
If yes can anyone point where this is explained in the manual or on the website?

Thanks!
Arda Tezcan




________________________________


Message: 1
Date: Wed, 14 Jul 2010 23:42:45 +0100
From: Raphael Payen <[email protected]>
Subject: Re: [Moses-support] Translating numbers
To: MikeDL <[email protected]>
Cc: [email protected]
Message-ID:
    <[email protected]>
Content-Type: text/plain; charset=UTF-8

Yes it's what I'd like to do also.

The idea that I wrote earlier: having the value as a factor, was
naive, since moses works on phrases, not on tokens. I think we need to
have information on the word alignments inside the phrases. A phrase
like:
I am NUM years old and have NUM cats   ->  Tengo NUM a?os y tengo NUM gatos
should also contain the info that the third token in source is aligned
with the second in target, and the eighth with the sixth. Then
postprocessing could assign the values.

I saw this on the ML:
Barry Haddow wrote:
> The word alignment info code got removed as it was using too much memory. If
> you really need it, then you could go back in svn to the time before the
> multi-threaded code was merged in (before r2477, I think)

Currently, the word alignment info is not even written in the phrase table.

It might be feasible to reintroduce the word alignment info, but only
for specific tokens ? Would this keep the memory use lower than having
it for all tokens ?

-- 
Raphael Payen



2010/7/14 MikeDL <[email protected]>:
>> For this replacement, I need to keep the value of the number number
>> along the translation, so the best option seems to add it as a factor
>> ? Then, all other words of the corpus need to have an empty factor.
>> It's not such an awful problem, but it seems strange.
>
> This is what I have also been working on. I would like to train using:
>
> I am NUM years old and have NUM cats ? -> ?Tengo NUM a?os, y tienen NUM gatos
> NUM bottles of beer on the wall ?-> ? NUM botella de cerveza en la pared
> etc.
>
> Then I want to translate:
>
> I have lived 42 years and have 2 dogs
>
> preprocess it to:
>
> I have lived NUM{1} years and have NUM{2} dogs
>
> get back from decoding
>
> He vivido NUM{1} a?os y tengo NUM{2} perros
>
> and postprocess this to
>
> He vivido 42 a?os y tengo 2 perros
>
> So that the NUM token (ignoring the index '{#}') is used for computing
> translation/reordering costs but the output gives me back the index (1, 2) so 
I
> can replace the NUM token with the actual value in postprocessing. ?I need the
> index value to handle multiple numbers in a phrase.
>
>



------------------------------


      
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to