Based on last year's eos marker discussions, we started using alternate sos/eos markers in both parallel and lm corpora. We settled on two obscure UTF-8 characters U+1D179 Musical Symbol Begin Phrase and U+1D17A Musical Symbol End Phrase. As in standard corpus preparation, the parallel corpora does not use <s></s> and lm corpora does. We've seen significant improvement in results without the need to reordering placement of <s></s> tags.
On 2013-02-11 00:43, Kenneth Heafield wrote: > On 02/10/13 17:21, John Joseph Morgan wrote: > >> Hello all, My understanding is that and end of sentence marker is inserted by the decoder at some point in the decoding process to give the complete sentence higher probability than shorter segments of the sentence. Is this correct? > > No. Inserting the eos marker gives the complete sentence lower > probability. p(</s> | foo bar .) < 1. It's inserted to model the end > of sentence. > >> If so, can the decoder be configured to not insert the eos marker? srilm's ngram-count has a -no-eos option, is there a similar option for the decoder? > > There is no command line option to disable </s>. Links: ------ [1] http://webmail.precisiontranslationtools.com/mailto<div> > ort">http://mailman.mit.edu/mailman/listinfo/moses-support > > _______________________________________________ Moses-support mailing list [email protected] </div>
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
