Hieu Hoang wrote:

> i do remember talking about this a few years ago.
> 
> It's possible in the hiero/syntax model. Indeed, the SOS/EOS can be modeled 
> in the phrase table by using the argument
>    --BoundaryRules
> when running 
>    extract-rules
> 
> however, it's slightly more complicated to implement in the phrase-based 
> model since you have to make sure the EOS/SOS aren't reordered.

Yeah, I just had a post-process that removed mid-sentence SOS/EOS, and added 
them back to the boundaries if necessary.

> If you're willing to implement it, I'll certainly help out

Unfortunately I don't have funding to work on this right now, but that might 
change in the medium term.

- JB

> On 11 February 2013 16:17, Burger, John D. <[email protected]> wrote:
> +1 on a way to turn off the automatic insertion of the sentence boundary 
> pseudo-tokens, which was implemented a few years ago.  I've requested this in 
> the past, but the answer then was the same, that this would be complicated by 
> the rule-based models.
> 
> I think this is worth thinking about, though, because in past experiments I 
> found modest but consistent gains in modeling sentence boundaries in the 
> phrase table as well as the language model.  The current setup makes this 
> difficult without hacking in strings of multiple pseudo-tokens.
> 
> - John Burger
>   MITRE
> 
> On Feb 10, 2013, at 12:43 PM, Kenneth Heafield wrote:
> 
> > On 02/10/13 17:21, John Joseph Morgan wrote:
> >> Hello all,
> >> My understanding is that and end of sentence marker is inserted by the 
> >> decoder at some point in the decoding process to give the complete 
> >> sentence higher probability than shorter segments of the sentence.
> >> Is this correct?
> >
> > No.  Inserting the eos marker gives the complete sentence lower
> > probability.  p(</s> | foo bar .) < 1.  It's inserted to model the end
> > of sentence.
> >
> >> If so, can the decoder be configured to not insert the eos marker?
> >> srilm's ngram-count has a -no-eos option, is there a similar option for 
> >> the decoder?
> >
> > There is no command line option to disable </s>.
> >
> >> What are the relevant files where this is coded?
> >
> > For phrase-based KenLM, moses/LM/Ken.cpp:255.  For phrase-based with
> > other lms, moses/LM/Implementation.cpp near 171.  For syntax, see
> > moses/Sentence.cpp near 187 but beware that </s> controls when the glue
> > rule applies.
> >
> >> Thanks,
> >> John
> >>
> >>
> >>
> >> _______________________________________________
> >> Moses-support mailing list
> >> [email protected]
> >> http://mailman.mit.edu/mailman/listinfo/moses-support
> >>
> > _______________________________________________
> > Moses-support mailing list
> > [email protected]
> > http://mailman.mit.edu/mailman/listinfo/moses-support
> 
> 
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
> 
> 
> 
> -- 
> Hieu Hoang
> Research Associate
> University of Edinburgh
> http://www.hoang.co.uk


_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to