Yes it's what I'd like to do also. The idea that I wrote earlier: having the value as a factor, was naive, since moses works on phrases, not on tokens. I think we need to have information on the word alignments inside the phrases. A phrase like: I am NUM years old and have NUM cats -> Tengo NUM años y tengo NUM gatos should also contain the info that the third token in source is aligned with the second in target, and the eighth with the sixth. Then postprocessing could assign the values.
I saw this on the ML: Barry Haddow wrote: > The word alignment info code got removed as it was using too much memory. If > you really need it, then you could go back in svn to the time before the > multi-threaded code was merged in (before r2477, I think) Currently, the word alignment info is not even written in the phrase table. It might be feasible to reintroduce the word alignment info, but only for specific tokens ? Would this keep the memory use lower than having it for all tokens ? -- Raphael Payen 2010/7/14 MikeDL <[email protected]>: >> For this replacement, I need to keep the value of the number number >> along the translation, so the best option seems to add it as a factor >> ? Then, all other words of the corpus need to have an empty factor. >> It's not such an awful problem, but it seems strange. > > This is what I have also been working on. I would like to train using: > > I am NUM years old and have NUM cats -> Tengo NUM años, y tienen NUM gatos > NUM bottles of beer on the wall -> NUM botella de cerveza en la pared > etc. > > Then I want to translate: > > I have lived 42 years and have 2 dogs > > preprocess it to: > > I have lived NUM{1} years and have NUM{2} dogs > > get back from decoding > > He vivido NUM{1} años y tengo NUM{2} perros > > and postprocess this to > > He vivido 42 años y tengo 2 perros > > So that the NUM token (ignoring the index '{#}') is used for computing > translation/reordering costs but the output gives me back the index (1, 2) so > I > can replace the NUM token with the actual value in postprocessing. I need the > index value to handle multiple numbers in a phrase. > > _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
