Hi Kenneth,

The output of kenlm/query is:

Loading the LM will be faster if you build a binary file.
Reading english.5gram.lm
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
****************************************************************************************
Language model is missing <unk>.  Substituting probability 0.
************

Loading statistics:
user    18.0001
sys     0.00047
rss   316632 kB
the 2 -0.894835 fifth 3 -3.34651 committee 2 -3.04771 resumed 1 -5.3955 
its 2 -1.99768 consideration 2 -3.4901 of 3 -0.281781 the 4 -0.240104 
item 3 -4.40691 at 2 -2.55249 its 2 -2.06475 64th 1 -7.43317 and 1 
-2.20519 74th 0 -1.13665 meetings 1 -3.82205 , 2 -1.05335 on 3 -2.12476 
15 3 -2.54839 may 4 -1.06142 and 4 -1.42049 2 3 -2.24962 june 4 
-0.381742 2000 2 -1.75696 . 3 -0.68658 </s> 4 -0.000255845 Total: -55.599
After queries:
user    18.0001
sys     0.00047
rss   316656 kB
Total time including destruction:
user    18.0001
sys     0.00051
rss     1312 kB

It seems that it is adding the end-of-sentence token, but not that of 
the begin of sentence.

Score (-55.599) is different from SRILM (-54.4623) and from IRSTLM 
(-49.9141 or -55.3099 when adding <s> and </s>).

Thanks for your help
--
Felipe

El 28/10/10 18:57, Kenneth Heafield escribió:
> Hi Felipe,
>
>       Please run $recent_moses_build/kenlm/query langmodel.lm<text and post
> the output (you didn't need the statistics, just the line containing
> "Total:").  That will tell you the score and n-gram length at each word.
>
> Kenneth
>
> On 10/28/10 12:42, Felipe Sánchez Martínez wrote:
>> Hello all,
>>
>> My question is about SRILM and IRSTLM, it is not directly related to
>> Moses, but I did not know where to ask.
>>
>> I am scoring individual sentences with a 5-gram language model and I get
>> different scores with SRILM and IRSTLM.
>>
>> The language model was trained with SRILM through the following command
>> line:
>>
>> $ srilm/bin/i686-m64/ngram-count -order $(LM_ORDER) -interpolate
>> -kndiscount -text text.txt -lm langmodel.lm
>>
>> I do not know why when scoring the same sentence I get different scores.
>> In this regard I have a few questions:
>> * Does SRILM introduces begin-of-sentence and end-of-sentence tokens
>> during training?
>> * and, during scoring (or decoding)?
>> * Does IRSTLM introduces begin-of-sentence and end-of-sentence tokens
>> during scoring (or decoding)?
>> * I know SRILM uses log base 10. Does IRSTLM also use log base 10? (It
>> seems so)
>>
>> When I score the English sentence "the fifth committee resumed its
>> consideration of the item at its 64th and 74th meetings , on 15 may and
>> 2 june 2000 ." the score (log prob) I get are:
>> SRILM: -54.4623
>> IRSTLM: -49.9141
>>
>> if I introduce<s>  and</s>  when scoring with IRSTLM I get a log prob of
>> -55.3099 (very similar to that of SRILM).
>>
>> The code to score with IRSTLM was borrowed from Moses.
>>
>> Than you very much for your help.
>>
>> Regards.
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support

-- 
Felipe Sánchez Martínez
Departamento de Lenguajes y Sistemas Informáticos
Universidad de Alicante, E-03071 Alicante (Spain)
Tel.: +34 965 903 400, ext: 2966 Fax: +34 965 909 326
http://www.dlsi.ua.es/~fsanchez
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to