Hieu Hoang wrote: > i do remember talking about this a few years ago. > > It's possible in the hiero/syntax model. Indeed, the SOS/EOS can be modeled > in the phrase table by using the argument > --BoundaryRules > when running > extract-rules > > however, it's slightly more complicated to implement in the phrase-based > model since you have to make sure the EOS/SOS aren't reordered.
Yeah, I just had a post-process that removed mid-sentence SOS/EOS, and added them back to the boundaries if necessary. > If you're willing to implement it, I'll certainly help out Unfortunately I don't have funding to work on this right now, but that might change in the medium term. - JB > On 11 February 2013 16:17, Burger, John D. <[email protected]> wrote: > +1 on a way to turn off the automatic insertion of the sentence boundary > pseudo-tokens, which was implemented a few years ago. I've requested this in > the past, but the answer then was the same, that this would be complicated by > the rule-based models. > > I think this is worth thinking about, though, because in past experiments I > found modest but consistent gains in modeling sentence boundaries in the > phrase table as well as the language model. The current setup makes this > difficult without hacking in strings of multiple pseudo-tokens. > > - John Burger > MITRE > > On Feb 10, 2013, at 12:43 PM, Kenneth Heafield wrote: > > > On 02/10/13 17:21, John Joseph Morgan wrote: > >> Hello all, > >> My understanding is that and end of sentence marker is inserted by the > >> decoder at some point in the decoding process to give the complete > >> sentence higher probability than shorter segments of the sentence. > >> Is this correct? > > > > No. Inserting the eos marker gives the complete sentence lower > > probability. p(</s> | foo bar .) < 1. It's inserted to model the end > > of sentence. > > > >> If so, can the decoder be configured to not insert the eos marker? > >> srilm's ngram-count has a -no-eos option, is there a similar option for > >> the decoder? > > > > There is no command line option to disable </s>. > > > >> What are the relevant files where this is coded? > > > > For phrase-based KenLM, moses/LM/Ken.cpp:255. For phrase-based with > > other lms, moses/LM/Implementation.cpp near 171. For syntax, see > > moses/Sentence.cpp near 187 but beware that </s> controls when the glue > > rule applies. > > > >> Thanks, > >> John > >> > >> > >> > >> _______________________________________________ > >> Moses-support mailing list > >> [email protected] > >> http://mailman.mit.edu/mailman/listinfo/moses-support > >> > > _______________________________________________ > > Moses-support mailing list > > [email protected] > > http://mailman.mit.edu/mailman/listinfo/moses-support > > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support > > > > -- > Hieu Hoang > Research Associate > University of Edinburgh > http://www.hoang.co.uk _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
