Hi,
Moses places sentence boundaries for the actual sentence
language model score - see the line
contextFactor[index++] = &languageModel.GetSentenceStartArray();
in Hypothesis.cpp
But LM estimates are computed for phrases for future cost estimation
and there no sentence start token is inserted. Apparently a vocab_none
token is inserted as filler material - not sure if that is smart. See line
context[count-1] = Vocab_None;
in LanguageModelSRI.cpp
-phi
On Mon, Jan 5, 2009 at 7:16 PM, Felipe Sánchez Martínez
<[email protected]> wrote:
>
> Hi, happy 2009 to all of you!,
>
> SRILM adds sentence boundaries by default (<s> and </s>); however, the
> last version allows to avoid this by using the flags -no-sos -no-eos
> when training the language model with ngram-count.
>
> With respect to moses, I think that it does not add sentence boundaries
> before computing the language model score. I do not know how it behaves
> for the computation of the rest of scores.
>
> Regards,
> --
> Felipe.
>
> El lun, 05-01-2009 a las 17:33 +0100, Joerg Tiedemann escribió:
>> happy new year to all of you!
>>
>> I forgot to follow up on this topic of sentence boundaries in srilm and
>> moses. maybe I missed the answer - but I don't recall that someone
>> answered the discussion below.
>>
>> how does moses do it? adding sentence boundaries or not? or does srilm
>> always assume that a string is a full sentence when called for computing
>> the LM score? what are the consequences for the incremental decoding
>> procedure in that case? and if sentence boundaries are added in the
>> internal calls to srilm - what happens when moses uses irstlm instead?
>>
>> could someone clarify? thanks in advance!
>>
>> jorg
>>
>>
>> >
>> > El vie, 14-11-2008 a las 08:21 +0100, Marcello Federico escribió:
>> >> Felipe,
>> >>
>> >> correct, irstlm does not add sentence boundaries.
>> >> irstlm uses them only if you add them to the data.
>> >>
>> >> srilm adds sentence boundaries by default around each
>> >> text line but you can disable this operation (check proper
>> >> option in the manual page of ngram-count and ngram).
>> >>
>> >> i'm not sure about how moses calls srilm internally.
>> >> my guess is that only single n-grams are passes to
>> >> srilm and that no sentence boundary symbols are
>> >> introduced by moses.
>> >>
>> >> marcello
>> >>
>> >> ________________________________________
>> >> From: [email protected] [[email protected]] On
>> >> Behalf Of J.Tiedemann [[email protected]]
>> >> Sent: Thursday, November 13, 2008 11:06 PM
>> >> To: [email protected]; [email protected]
>> >> Subject: Re: [Moses-support] Translating words or phrases in isolation
>> >>
>> >> I'm not 100% sure but I think that IRSTLM does not add sentence
>> >> boundary tokens. maybe that's an option?
>> >>
>> >> jorg
>> >>
>> >>
>> >> On Thu, 13 Nov 2008 20:58:54 +0100
>> >> Felipe Sánchez Martínez <[email protected]> wrote:
>> >>> Hi all,
>> >>>
>> >>> I am using Moses to obtain translation candidates (in the form of
>> >>> n-best
>> >>> lists) for phrases or words in isolation; that is, I am not
>> >>> translating
>> >>> whole (well-formed) sentences.
>> >>>
>> >>> Does SRILM (the language model I am using with Moses) introduce a
>> >>> begin-of-sentence token before computing the likelihood of the input
>> >>> sentence (in my case a phrase or a word).
>> >>>
>> >>> If the question to the previous question is yes. How could I avoid
>> >>> that?
>> >>>
>> >>> Thank you very much in advance,
>> >>>
>> >>> Kind regards
>> >>>
>> >>> --
>> >>> Felipe
>> >>>
>> >>> _______________________________________________
>> >>> Moses-support mailing list
>> >>> [email protected]
>> >>> http://mailman.mit.edu/mailman/listinfo/moses-support
>> >> _______________________________________________
>> >> Moses-support mailing list
>> >> [email protected]
>> >> http://mailman.mit.edu/mailman/listinfo/moses-support
>> >
>> > _______________________________________________
>> > Moses-support mailing list
>> > [email protected]
>> > http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
> --
> Felipe Sánchez Martínez <[email protected]>
> Departamento de Lenguajes y Sistemas Informáticos
> Universidad de Alicante, E-03071 Alicante (Spain)
> Tel.: +34 965 903 400, ext: 2038 Fax: +34 965 909 326
> http://www.dlsi.ua.es/~fsanchez
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support