Hi Kenneth,

Just to tell you that after training SRILM with -unk and adding the 
following code to my SRILM load function

_sri_ngramLM->skipOOVs() = false;

I get the same score with SRILM and kenlm. Unfortunately this is not the 
case for IRSTLM. I'll look at my code because I think that there might 
be something wrong.

Thanks again for your help.
Regards
--
Felipe


El 29/10/10 16:09, Kenneth Heafield escribió:
> kenlm's query tool implicitly places<s>  at the beginning. It doesn't
> appear in the output, but you can see the effect because the n-gram
> length after the is 2, not 1.
>
> The difference between the kenlm result and SRILM is the unknown word
> "74th".  -55.599 + 1.13665 = -54.46235.  The term -1.13665 appears to be
> the LM's backoff weight for the unigram "and".  I think including the
> backoff is the right thing to do here and it's how Moses configures
> SRILM to operate (so you may want to look at LanguageModelSRI.cpp and
> copy how it initializes SRI).
>
> As to IRST, I hope they find the n-gram lengths and probabilities after
> each word useful in explaining that difference.
>
> Kenneth
>
> On 10/29/10 08:55, Felipe Sánchez Martínez wrote:
>> Hi Kenneth,
>>
>> The output of kenlm/query is:
>>
>> Loading the LM will be faster if you build a binary file.
>> Reading english.5gram.lm
>> ----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
>> ****************************************************************************************
>> Language model is missing<unk>.  Substituting probability 0.
>> ************
>>
>> Loading statistics:
>> user    18.0001
>> sys     0.00047
>> rss   316632 kB
>> the 2 -0.894835 fifth 3 -3.34651 committee 2 -3.04771 resumed 1 -5.3955
>> its 2 -1.99768 consideration 2 -3.4901 of 3 -0.281781 the 4 -0.240104
>> item 3 -4.40691 at 2 -2.55249 its 2 -2.06475 64th 1 -7.43317 and 1
>> -2.20519 74th 0 -1.13665 meetings 1 -3.82205 , 2 -1.05335 on 3 -2.12476
>> 15 3 -2.54839 may 4 -1.06142 and 4 -1.42049 2 3 -2.24962 june 4
>> -0.381742 2000 2 -1.75696 . 3 -0.68658</s>  4 -0.000255845 Total: -55.599
>> After queries:
>> user    18.0001
>> sys     0.00047
>> rss   316656 kB
>> Total time including destruction:
>> user    18.0001
>> sys     0.00051
>> rss     1312 kB
>>
>> It seems that it is adding the end-of-sentence token, but not that of
>> the begin of sentence.
>>
>> Score (-55.599) is different from SRILM (-54.4623) and from IRSTLM
>> (-49.9141 or -55.3099 when adding<s>  and</s>).
>>
>> Thanks for your help
>> --
>> Felipe
>>
>> El 28/10/10 18:57, Kenneth Heafield escribió:
>>> Hi Felipe,
>>>
>>>     Please run $recent_moses_build/kenlm/query langmodel.lm<text and post
>>> the output (you didn't need the statistics, just the line containing
>>> "Total:").  That will tell you the score and n-gram length at each word.
>>>
>>> Kenneth
>>>
>>> On 10/28/10 12:42, Felipe Sánchez Martínez wrote:
>>>> Hello all,
>>>>
>>>> My question is about SRILM and IRSTLM, it is not directly related to
>>>> Moses, but I did not know where to ask.
>>>>
>>>> I am scoring individual sentences with a 5-gram language model and I get
>>>> different scores with SRILM and IRSTLM.
>>>>
>>>> The language model was trained with SRILM through the following command
>>>> line:
>>>>
>>>> $ srilm/bin/i686-m64/ngram-count -order $(LM_ORDER) -interpolate
>>>> -kndiscount -text text.txt -lm langmodel.lm
>>>>
>>>> I do not know why when scoring the same sentence I get different scores.
>>>> In this regard I have a few questions:
>>>> * Does SRILM introduces begin-of-sentence and end-of-sentence tokens
>>>> during training?
>>>> * and, during scoring (or decoding)?
>>>> * Does IRSTLM introduces begin-of-sentence and end-of-sentence tokens
>>>> during scoring (or decoding)?
>>>> * I know SRILM uses log base 10. Does IRSTLM also use log base 10? (It
>>>> seems so)
>>>>
>>>> When I score the English sentence "the fifth committee resumed its
>>>> consideration of the item at its 64th and 74th meetings , on 15 may and
>>>> 2 june 2000 ." the score (log prob) I get are:
>>>> SRILM: -54.4623
>>>> IRSTLM: -49.9141
>>>>
>>>> if I introduce<s>   and</s>   when scoring with IRSTLM I get a log prob of
>>>> -55.3099 (very similar to that of SRILM).
>>>>
>>>> The code to score with IRSTLM was borrowed from Moses.
>>>>
>>>> Than you very much for your help.
>>>>
>>>> Regards.
>>> _______________________________________________
>>> Moses-support mailing list
>>> [email protected]
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support

-- 
Felipe Sánchez Martínez
Departamento de Lenguajes y Sistemas Informáticos
Universidad de Alicante, E-03071 Alicante (Spain)
Tel.: +34 965 903 400, ext: 2966 Fax: +34 965 909 326
http://www.dlsi.ua.es/~fsanchez
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to