On Oct 29, 2010, at 9:01, Felipe Sánchez Martínez wrote: > El 28/10/10 19:34, John Burger escribió: >> I don't think Moses adds them - it can't know how you trained the LM. >> We add them ourselves, and tell SRILM not to add them. (We get some >> small gain in BLEU by doing this, by the way.) > > When you say this "we add them" what do you mean, You or Moses? :) > It is > not clear to me from your answer if Moses adds it when evaluating a > translation or not.
Sorry. As will be clear from the following, I do not think the Moses decoder adds sentence boundary pseudo-tokens to the input. (I have never confirmed this, though, so I'd dearly like to know if I'm wrong.) We insert <S> and </S> into all of our data: both sides of the parallel training and tuning data, all the LM data, and finally test data to be run through the decoder. Accordingly, when we use SRILM, we pass it the -no-sos and -no-eos flags. We find a small but consistent benefit from having the sentence boundaries in both the translation and language models, even though Moses occasionally produces output like this: <s> this paper reported the synthesis of the title compounds , the symptoms and the control . </s> anti-bacterial activity . Of course we remove the boundary pseudo-tokens from both sides before scoring. I hope this is more clear. - John D. Burger MITRE _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
