kenlm's query tool implicitly places <s> at the beginning. It doesn't appear in the output, but you can see the effect because the n-gram length after the is 2, not 1.
The difference between the kenlm result and SRILM is the unknown word "74th". -55.599 + 1.13665 = -54.46235. The term -1.13665 appears to be the LM's backoff weight for the unigram "and". I think including the backoff is the right thing to do here and it's how Moses configures SRILM to operate (so you may want to look at LanguageModelSRI.cpp and copy how it initializes SRI). As to IRST, I hope they find the n-gram lengths and probabilities after each word useful in explaining that difference. Kenneth On 10/29/10 08:55, Felipe Sánchez Martínez wrote: > Hi Kenneth, > > The output of kenlm/query is: > > Loading the LM will be faster if you build a binary file. > Reading english.5gram.lm > ----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100 > **************************************************************************************** > Language model is missing <unk>. Substituting probability 0. > ************ > > Loading statistics: > user 18.0001 > sys 0.00047 > rss 316632 kB > the 2 -0.894835 fifth 3 -3.34651 committee 2 -3.04771 resumed 1 -5.3955 > its 2 -1.99768 consideration 2 -3.4901 of 3 -0.281781 the 4 -0.240104 > item 3 -4.40691 at 2 -2.55249 its 2 -2.06475 64th 1 -7.43317 and 1 > -2.20519 74th 0 -1.13665 meetings 1 -3.82205 , 2 -1.05335 on 3 -2.12476 > 15 3 -2.54839 may 4 -1.06142 and 4 -1.42049 2 3 -2.24962 june 4 > -0.381742 2000 2 -1.75696 . 3 -0.68658 </s> 4 -0.000255845 Total: -55.599 > After queries: > user 18.0001 > sys 0.00047 > rss 316656 kB > Total time including destruction: > user 18.0001 > sys 0.00051 > rss 1312 kB > > It seems that it is adding the end-of-sentence token, but not that of > the begin of sentence. > > Score (-55.599) is different from SRILM (-54.4623) and from IRSTLM > (-49.9141 or -55.3099 when adding <s> and </s>). > > Thanks for your help > -- > Felipe > > El 28/10/10 18:57, Kenneth Heafield escribió: >> Hi Felipe, >> >> Please run $recent_moses_build/kenlm/query langmodel.lm<text and post >> the output (you didn't need the statistics, just the line containing >> "Total:"). That will tell you the score and n-gram length at each word. >> >> Kenneth >> >> On 10/28/10 12:42, Felipe Sánchez Martínez wrote: >>> Hello all, >>> >>> My question is about SRILM and IRSTLM, it is not directly related to >>> Moses, but I did not know where to ask. >>> >>> I am scoring individual sentences with a 5-gram language model and I get >>> different scores with SRILM and IRSTLM. >>> >>> The language model was trained with SRILM through the following command >>> line: >>> >>> $ srilm/bin/i686-m64/ngram-count -order $(LM_ORDER) -interpolate >>> -kndiscount -text text.txt -lm langmodel.lm >>> >>> I do not know why when scoring the same sentence I get different scores. >>> In this regard I have a few questions: >>> * Does SRILM introduces begin-of-sentence and end-of-sentence tokens >>> during training? >>> * and, during scoring (or decoding)? >>> * Does IRSTLM introduces begin-of-sentence and end-of-sentence tokens >>> during scoring (or decoding)? >>> * I know SRILM uses log base 10. Does IRSTLM also use log base 10? (It >>> seems so) >>> >>> When I score the English sentence "the fifth committee resumed its >>> consideration of the item at its 64th and 74th meetings , on 15 may and >>> 2 june 2000 ." the score (log prob) I get are: >>> SRILM: -54.4623 >>> IRSTLM: -49.9141 >>> >>> if I introduce<s> and</s> when scoring with IRSTLM I get a log prob of >>> -55.3099 (very similar to that of SRILM). >>> >>> The code to score with IRSTLM was borrowed from Moses. >>> >>> Than you very much for your help. >>> >>> Regards. >> _______________________________________________ >> Moses-support mailing list >> [email protected] >> http://mailman.mit.edu/mailman/listinfo/moses-support > _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
