It looks like Moses, by default at least, implicitly adds <s> and </s>
to the target side for language model scoring purposes.  This behavior
is independent of the LM used inside Moses.  I don't know if there's an
option to disable this behavior.  I can tell you that language models
are not designed to score internal <s> and </s> tokens so you'll get
weird results if they're duplicated. . .

Kenneth

On 10/29/10 10:37, Kenneth Heafield wrote:
> That documentation was specific to kenlm's query tool.  kenlm does the
> same thing as SRI with respect to sentence boundary tokens.  As to what
> that is, I'm deferring to Edinburgh.
> 
> Kenneth
> 
> On 10/29/10 10:28, John Burger wrote:
>> Kenneth Heafield wrote:
>>
>>> kenlm's query tool implicitly places <s> at the beginning. It doesn't
>>> appear in the output, but you can see the effect because the n-gram
>>> length after the is 2, not 1.
>>
>> Does this happen when kenlm is called from Moses as well?
>>
>> There seem to me to be many reasons not to do this:  How do you know  
>> whether full sentences are being translated?  What if the translation  
>> model already includes sentence boundary tokens?  (See my recent  
>> message about why this might be desirable)
>>
>> But most importantly: How do you know whether the language model was  
>> trained that way?
>>
>> - John Burger
>>    MITRE
>>
>> _______________________________________________
>> Moses-support mailing list
>> [email protected]
>> http://mailman.mit.edu/mailman/listinfo/moses-support
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to