yes, the lm scores are calculated with <s> is added to the beginning of
the sentence (& </s> added to the end). Since a phrase-based creates
target sentences left-to-right, you always know where the beginning is.
For chart decoding, it's also added but the input explicitly has
<s>...</s> so the only added when the 1st & last 'word' are translated.
It's usually better to have <s> </s> symbols but you're right about
problems if it's in your trans. model. This was done in this paper with
some positive result
http://www.umiacs.umd.edu/~ymarton/pub/wmt09/DyerSetiawanMartonResnik_wmt09_UMD-SMT-sys.pdf
Perhaps it should be an option
> Does this happen when kenlm is called from Moses as well?
>
> There seem to me to be many reasons not to do this: How do you know
> whether full sentences are being translated? What if the translation
> model already includes sentence boundary tokens? (See my recent
> message about why this might be desirable)
>
> But most importantly: How do you know whether the language model was
> trained that way?
>
> - John Burger
> MITRE
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support