Hi, happy 2009 to all of you!, SRILM adds sentence boundaries by default (<s> and </s>); however, the last version allows to avoid this by using the flags -no-sos -no-eos when training the language model with ngram-count.
With respect to moses, I think that it does not add sentence boundaries before computing the language model score. I do not know how it behaves for the computation of the rest of scores. Regards, -- Felipe. El lun, 05-01-2009 a las 17:33 +0100, Joerg Tiedemann escribió: > happy new year to all of you! > > I forgot to follow up on this topic of sentence boundaries in srilm and > moses. maybe I missed the answer - but I don't recall that someone > answered the discussion below. > > how does moses do it? adding sentence boundaries or not? or does srilm > always assume that a string is a full sentence when called for computing > the LM score? what are the consequences for the incremental decoding > procedure in that case? and if sentence boundaries are added in the > internal calls to srilm - what happens when moses uses irstlm instead? > > could someone clarify? thanks in advance! > > jorg > > > > > > El vie, 14-11-2008 a las 08:21 +0100, Marcello Federico escribió: > >> Felipe, > >> > >> correct, irstlm does not add sentence boundaries. > >> irstlm uses them only if you add them to the data. > >> > >> srilm adds sentence boundaries by default around each > >> text line but you can disable this operation (check proper > >> option in the manual page of ngram-count and ngram). > >> > >> i'm not sure about how moses calls srilm internally. > >> my guess is that only single n-grams are passes to > >> srilm and that no sentence boundary symbols are > >> introduced by moses. > >> > >> marcello > >> > >> ________________________________________ > >> From: [email protected] [[email protected]] On > >> Behalf Of J.Tiedemann [[email protected]] > >> Sent: Thursday, November 13, 2008 11:06 PM > >> To: [email protected]; [email protected] > >> Subject: Re: [Moses-support] Translating words or phrases in isolation > >> > >> I'm not 100% sure but I think that IRSTLM does not add sentence > >> boundary tokens. maybe that's an option? > >> > >> jorg > >> > >> > >> On Thu, 13 Nov 2008 20:58:54 +0100 > >> Felipe Sánchez Martínez <[email protected]> wrote: > >>> Hi all, > >>> > >>> I am using Moses to obtain translation candidates (in the form of > >>> n-best > >>> lists) for phrases or words in isolation; that is, I am not > >>> translating > >>> whole (well-formed) sentences. > >>> > >>> Does SRILM (the language model I am using with Moses) introduce a > >>> begin-of-sentence token before computing the likelihood of the input > >>> sentence (in my case a phrase or a word). > >>> > >>> If the question to the previous question is yes. How could I avoid > >>> that? > >>> > >>> Thank you very much in advance, > >>> > >>> Kind regards > >>> > >>> -- > >>> Felipe > >>> > >>> _______________________________________________ > >>> Moses-support mailing list > >>> [email protected] > >>> http://mailman.mit.edu/mailman/listinfo/moses-support > >> _______________________________________________ > >> Moses-support mailing list > >> [email protected] > >> http://mailman.mit.edu/mailman/listinfo/moses-support > > > > _______________________________________________ > > Moses-support mailing list > > [email protected] > > http://mailman.mit.edu/mailman/listinfo/moses-support > > -- Felipe Sánchez Martínez <[email protected]> Departamento de Lenguajes y Sistemas Informáticos Universidad de Alicante, E-03071 Alicante (Spain) Tel.: +34 965 903 400, ext: 2038 Fax: +34 965 909 326 http://www.dlsi.ua.es/~fsanchez _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
