It looks like Moses, by default at least, implicitly adds <s> and </s> to the target side for language model scoring purposes. This behavior is independent of the LM used inside Moses. I don't know if there's an option to disable this behavior. I can tell you that language models are not designed to score internal <s> and </s> tokens so you'll get weird results if they're duplicated. . .
Kenneth On 10/29/10 10:37, Kenneth Heafield wrote: > That documentation was specific to kenlm's query tool. kenlm does the > same thing as SRI with respect to sentence boundary tokens. As to what > that is, I'm deferring to Edinburgh. > > Kenneth > > On 10/29/10 10:28, John Burger wrote: >> Kenneth Heafield wrote: >> >>> kenlm's query tool implicitly places <s> at the beginning. It doesn't >>> appear in the output, but you can see the effect because the n-gram >>> length after the is 2, not 1. >> >> Does this happen when kenlm is called from Moses as well? >> >> There seem to me to be many reasons not to do this: How do you know >> whether full sentences are being translated? What if the translation >> model already includes sentence boundary tokens? (See my recent >> message about why this might be desirable) >> >> But most importantly: How do you know whether the language model was >> trained that way? >> >> - John Burger >> MITRE >> >> _______________________________________________ >> Moses-support mailing list >> [email protected] >> http://mailman.mit.edu/mailman/listinfo/moses-support > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
