Hi Kenneth, The output of kenlm/query is:
Loading the LM will be faster if you build a binary file. Reading english.5gram.lm ----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100 **************************************************************************************** Language model is missing <unk>. Substituting probability 0. ************ Loading statistics: user 18.0001 sys 0.00047 rss 316632 kB the 2 -0.894835 fifth 3 -3.34651 committee 2 -3.04771 resumed 1 -5.3955 its 2 -1.99768 consideration 2 -3.4901 of 3 -0.281781 the 4 -0.240104 item 3 -4.40691 at 2 -2.55249 its 2 -2.06475 64th 1 -7.43317 and 1 -2.20519 74th 0 -1.13665 meetings 1 -3.82205 , 2 -1.05335 on 3 -2.12476 15 3 -2.54839 may 4 -1.06142 and 4 -1.42049 2 3 -2.24962 june 4 -0.381742 2000 2 -1.75696 . 3 -0.68658 </s> 4 -0.000255845 Total: -55.599 After queries: user 18.0001 sys 0.00047 rss 316656 kB Total time including destruction: user 18.0001 sys 0.00051 rss 1312 kB It seems that it is adding the end-of-sentence token, but not that of the begin of sentence. Score (-55.599) is different from SRILM (-54.4623) and from IRSTLM (-49.9141 or -55.3099 when adding <s> and </s>). Thanks for your help -- Felipe El 28/10/10 18:57, Kenneth Heafield escribió: > Hi Felipe, > > Please run $recent_moses_build/kenlm/query langmodel.lm<text and post > the output (you didn't need the statistics, just the line containing > "Total:"). That will tell you the score and n-gram length at each word. > > Kenneth > > On 10/28/10 12:42, Felipe Sánchez Martínez wrote: >> Hello all, >> >> My question is about SRILM and IRSTLM, it is not directly related to >> Moses, but I did not know where to ask. >> >> I am scoring individual sentences with a 5-gram language model and I get >> different scores with SRILM and IRSTLM. >> >> The language model was trained with SRILM through the following command >> line: >> >> $ srilm/bin/i686-m64/ngram-count -order $(LM_ORDER) -interpolate >> -kndiscount -text text.txt -lm langmodel.lm >> >> I do not know why when scoring the same sentence I get different scores. >> In this regard I have a few questions: >> * Does SRILM introduces begin-of-sentence and end-of-sentence tokens >> during training? >> * and, during scoring (or decoding)? >> * Does IRSTLM introduces begin-of-sentence and end-of-sentence tokens >> during scoring (or decoding)? >> * I know SRILM uses log base 10. Does IRSTLM also use log base 10? (It >> seems so) >> >> When I score the English sentence "the fifth committee resumed its >> consideration of the item at its 64th and 74th meetings , on 15 may and >> 2 june 2000 ." the score (log prob) I get are: >> SRILM: -54.4623 >> IRSTLM: -49.9141 >> >> if I introduce<s> and</s> when scoring with IRSTLM I get a log prob of >> -55.3099 (very similar to that of SRILM). >> >> The code to score with IRSTLM was borrowed from Moses. >> >> Than you very much for your help. >> >> Regards. > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support -- Felipe Sánchez Martínez Departamento de Lenguajes y Sistemas Informáticos Universidad de Alicante, E-03071 Alicante (Spain) Tel.: +34 965 903 400, ext: 2966 Fax: +34 965 909 326 http://www.dlsi.ua.es/~fsanchez _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
