Hello all, My question is about SRILM and IRSTLM, it is not directly related to Moses, but I did not know where to ask.
I am scoring individual sentences with a 5-gram language model and I get different scores with SRILM and IRSTLM. The language model was trained with SRILM through the following command line: $ srilm/bin/i686-m64/ngram-count -order $(LM_ORDER) -interpolate -kndiscount -text text.txt -lm langmodel.lm I do not know why when scoring the same sentence I get different scores. In this regard I have a few questions: * Does SRILM introduces begin-of-sentence and end-of-sentence tokens during training? * and, during scoring (or decoding)? * Does IRSTLM introduces begin-of-sentence and end-of-sentence tokens during scoring (or decoding)? * I know SRILM uses log base 10. Does IRSTLM also use log base 10? (It seems so) When I score the English sentence "the fifth committee resumed its consideration of the item at its 64th and 74th meetings , on 15 may and 2 june 2000 ." the score (log prob) I get are: SRILM: -54.4623 IRSTLM: -49.9141 if I introduce <s> and </s> when scoring with IRSTLM I get a log prob of -55.3099 (very similar to that of SRILM). The code to score with IRSTLM was borrowed from Moses. Than you very much for your help. Regards. -- Felipe Sánchez Martínez Departamento de Lenguajes y Sistemas Informáticos Universidad de Alicante, E-03071 Alicante (Spain) Tel.: +34 965 903 400, ext: 2966 Fax: +34 965 909 326 http://www.dlsi.ua.es/~fsanchez _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
