Based on last year's eos marker discussions, we started using
alternate sos/eos markers in both parallel and lm corpora. We settled on
two obscure UTF-8 characters U+1D179 Musical Symbol Begin Phrase and
U+1D17A Musical Symbol End Phrase. As in standard corpus preparation,
the parallel corpora does not use <s></s> and lm corpora does. We've
seen significant improvement in results without the need to reordering
placement of <s></s> tags. 

On 2013-02-11 00:43, Kenneth Heafield
wrote: 

> On 02/10/13 17:21, John Joseph Morgan wrote:
> 
>> Hello all,
My understanding is that and end of sentence marker is inserted by the
decoder at some point in the decoding process to give the complete
sentence higher probability than shorter segments of the sentence. Is
this correct?
> 
> No. Inserting the eos marker gives the complete
sentence lower 
> probability. p(</s> | foo bar .) < 1. It's inserted to
model the end 
> of sentence.
> 
>> If so, can the decoder be configured
to not insert the eos marker? srilm's ngram-count has a -no-eos option,
is there a similar option for the decoder?
> 
> There is no command line
option to disable </s>.
 

Links:
------
[1]
http://webmail.precisiontranslationtools.com/mailto<div>

&gt;&nbsp;ort&quot;&gt;http://mailman.mit.edu/mailman/listinfo/moses-support&nbsp;
&gt;&nbsp;
&gt;&nbsp;_______________________________________________&nbsp;Moses-support&nbsp;mailing&nbsp;list&nbsp;[email protected]
</div>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to