yes, the lm scores are calculated with <s> is added to the beginning of 
the sentence (& </s> added to the end). Since a phrase-based creates 
target sentences left-to-right, you always know where the beginning is. 
For chart decoding, it's also added but the input explicitly has 
<s>...</s> so the only added when the 1st & last 'word' are translated.

It's usually better to have <s> </s> symbols but you're right about 
problems if it's in your trans. model. This was done in this paper with 
some positive result
    
http://www.umiacs.umd.edu/~ymarton/pub/wmt09/DyerSetiawanMartonResnik_wmt09_UMD-SMT-sys.pdf

Perhaps it should be an option

> Does this happen when kenlm is called from Moses as well?
>
> There seem to me to be many reasons not to do this:  How do you know
> whether full sentences are being translated?  What if the translation
> model already includes sentence boundary tokens?  (See my recent
> message about why this might be desirable)
>
> But most importantly: How do you know whether the language model was
> trained that way?
>
> - John Burger
>     MITRE
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to