Re: [Moses-support] Different scores with SRILM and IRSTLM

Felipe Sánchez Martínez Fri, 29 Oct 2010 05:57:19 -0700

Hi Philipp,

I'm not using the decoder. I am using SRILM directly and scoring 
sentences using the following piece of code:


   TextStats ts;
   VocabString words[maxWordsPerLine+1];
   char segment_str[segment.size()+1]; //sentece to score is in segment

   segment.copy(segment_str, segment.size(), 0);
   segment_str[segment.size()]='\0';

   _sri_vocab->parseWords(segment_str, words, maxWordsPerLine+1);
   //double p = LogPtoProb(_sri_ngramLM->sentenceProb(words, ts));
   LogP p = _sri_ngramLM->sentenceProb(words, ts);

   cerr<<"Segment: "<<segment<<endl;
   cerr<<"Num. words: "<<ts.numWords<<endl;
   cerr<<"Num. words OOV: "<<ts.numOOVs<<endl;
   cerr<<"Prob: "<<p<<endl;

and IRSTLM using the following code from a colleague:

   string buf;
   vector<string> s_unigrams;

   stringstream ss(frame); // frame is the sentence to score
   ngram* m_lmtb_ng;
   int lmId = 0;
   float num = 0.0f;
   float prob = 0.0f;
   float sprob = 0.0f;

   m_lmtb_ng = new ngram(m_lmtb->getDict()); // ngram of words
   m_lmtb_ng->size = 0;

   while (ss >> buf) {
     s_unigrams.push_back(buf);
   }

   for(unsigned int i = (m_nGramOrder - 1); i < (s_unigrams.size()); i++) {
     buf = "";
     m_lmtb_ng = new ngram(m_lmtb->getDict()); // ngram of words
     for(unsigned int j = m_nGramOrder; j > 0; j--) {
       buf = buf + s_unigrams.at(i - (j - 1)) + " ";
       lmId = m_lmtb->getDict()->encode(s_unigrams.at(i - (j - 1)).c_str());
       m_lmtb_ng->pushc(lmId);
     }

     prob = m_lmtb->clprob(*m_lmtb_ng);
     sprob += prob;
     //cerr << "_" << m_nGramOrder << ": " << buf << " " << prob << endl;
     delete m_lmtb_ng;
     num++;
   }

   cerr<<"Prob: "<<sprob<<endl;


This last peace of code is supposed to come from Moses. As I did not 
write it myself I do not know from which file it was borrowed.


Thank you very much for your help.

--
Felipe

El 28/10/10 18:50, Philipp Koehn escribió:
> Hi,
>
> this should not happen - I know in the past there were some
> issues with unknown words. Can you track down where the
> n-gram scores differ (e.g., run the decoder with "-v 3"), or
> dump out the search graph?
>
> -phi
>
> 2010/10/28 Felipe Sánchez Martínez<[email protected]>:
>> Hello all,
>>
>> My question is about SRILM and IRSTLM, it is not directly related to
>> Moses, but I did not know where to ask.
>>
>> I am scoring individual sentences with a 5-gram language model and I get
>> different scores with SRILM and IRSTLM.
>>
>> The language model was trained with SRILM through the following command
>> line:
>>
>> $ srilm/bin/i686-m64/ngram-count -order $(LM_ORDER) -interpolate
>> -kndiscount -text text.txt -lm langmodel.lm
>>
>> I do not know why when scoring the same sentence I get different scores.
>> In this regard I have a few questions:
>> * Does SRILM introduces begin-of-sentence and end-of-sentence tokens
>> during training?
>> * and, during scoring (or decoding)?
>> * Does IRSTLM introduces begin-of-sentence and end-of-sentence tokens
>> during scoring (or decoding)?
>> * I know SRILM uses log base 10. Does IRSTLM also use log base 10? (It
>> seems so)
>>
>> When I score the English sentence "the fifth committee resumed its
>> consideration of the item at its 64th and 74th meetings , on 15 may and
>> 2 june 2000 ." the score (log prob) I get are:
>> SRILM: -54.4623
>> IRSTLM: -49.9141
>>
>> if I introduce<s>  and</s>  when scoring with IRSTLM I get a log prob of
>> -55.3099 (very similar to that of SRILM).
>>
>> The code to score with IRSTLM was borrowed from Moses.
>>
>> Than you very much for your help.
>>
>> Regards.
>> --
>> Felipe Sánchez Martínez
>> Departamento de Lenguajes y Sistemas Informáticos
>> Universidad de Alicante, E-03071 Alicante (Spain)
>> Tel.: +34 965 903 400, ext: 2966 Fax: +34 965 909 326
>> http://www.dlsi.ua.es/~fsanchez
>> _______________________________________________
>> Moses-support mailing list
>> [email protected]
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>

-- 
Felipe Sánchez Martínez
Departamento de Lenguajes y Sistemas Informáticos
Universidad de Alicante, E-03071 Alicante (Spain)
Tel.: +34 965 903 400, ext: 2966 Fax: +34 965 909 326
http://www.dlsi.ua.es/~fsanchez
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Different scores with SRILM and IRSTLM

Reply via email to