On Oct 29, 2010, at 9:01, Felipe Sánchez Martínez wrote:

> El 28/10/10 19:34, John Burger escribió:
>> I don't think Moses adds them - it can't know how you trained the LM.
>> We add them ourselves, and tell SRILM not to add them.  (We get some
>> small gain in BLEU by doing this, by the way.)
>
> When you say this "we add them" what do you mean, You or Moses? :)  
> It is
> not clear to me from your answer if Moses adds it when evaluating a
> translation or not.

Sorry.

As will be clear from the following, I do not think the Moses decoder  
adds sentence boundary pseudo-tokens to the input.  (I have never  
confirmed this, though, so I'd dearly like to know if I'm wrong.)

We insert <S> and </S> into all of our data: both sides of the  
parallel training and tuning data, all the LM data, and finally test  
data to be run through the decoder.  Accordingly, when we use SRILM,  
we pass it the -no-sos and -no-eos flags.  We find a small but  
consistent benefit from having the sentence boundaries in both the  
translation and language models, even though Moses occasionally  
produces output like this:

   <s> this paper reported the synthesis of the title compounds ,
   the symptoms and the control . </s> anti-bacterial activity .

Of course we remove the boundary pseudo-tokens from both sides before  
scoring.

I hope this is more clear.

- John D. Burger
   MITRE


_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to