Hi Philipp,
I'm not using the decoder. I am using SRILM directly and scoring
sentences using the following piece of code:
TextStats ts;
VocabString words[maxWordsPerLine+1];
char segment_str[segment.size()+1]; //sentece to score is in segment
segment.copy(segment_str, segment.size(), 0);
segment_str[segment.size()]='\0';
_sri_vocab->parseWords(segment_str, words, maxWordsPerLine+1);
//double p = LogPtoProb(_sri_ngramLM->sentenceProb(words, ts));
LogP p = _sri_ngramLM->sentenceProb(words, ts);
cerr<<"Segment: "<<segment<<endl;
cerr<<"Num. words: "<<ts.numWords<<endl;
cerr<<"Num. words OOV: "<<ts.numOOVs<<endl;
cerr<<"Prob: "<<p<<endl;
and IRSTLM using the following code from a colleague:
string buf;
vector<string> s_unigrams;
stringstream ss(frame); // frame is the sentence to score
ngram* m_lmtb_ng;
int lmId = 0;
float num = 0.0f;
float prob = 0.0f;
float sprob = 0.0f;
m_lmtb_ng = new ngram(m_lmtb->getDict()); // ngram of words
m_lmtb_ng->size = 0;
while (ss >> buf) {
s_unigrams.push_back(buf);
}
for(unsigned int i = (m_nGramOrder - 1); i < (s_unigrams.size()); i++) {
buf = "";
m_lmtb_ng = new ngram(m_lmtb->getDict()); // ngram of words
for(unsigned int j = m_nGramOrder; j > 0; j--) {
buf = buf + s_unigrams.at(i - (j - 1)) + " ";
lmId = m_lmtb->getDict()->encode(s_unigrams.at(i - (j - 1)).c_str());
m_lmtb_ng->pushc(lmId);
}
prob = m_lmtb->clprob(*m_lmtb_ng);
sprob += prob;
//cerr << "_" << m_nGramOrder << ": " << buf << " " << prob << endl;
delete m_lmtb_ng;
num++;
}
cerr<<"Prob: "<<sprob<<endl;
This last peace of code is supposed to come from Moses. As I did not
write it myself I do not know from which file it was borrowed.
Thank you very much for your help.
--
Felipe
El 28/10/10 18:50, Philipp Koehn escribió:
> Hi,
>
> this should not happen - I know in the past there were some
> issues with unknown words. Can you track down where the
> n-gram scores differ (e.g., run the decoder with "-v 3"), or
> dump out the search graph?
>
> -phi
>
> 2010/10/28 Felipe Sánchez Martínez<[email protected]>:
>> Hello all,
>>
>> My question is about SRILM and IRSTLM, it is not directly related to
>> Moses, but I did not know where to ask.
>>
>> I am scoring individual sentences with a 5-gram language model and I get
>> different scores with SRILM and IRSTLM.
>>
>> The language model was trained with SRILM through the following command
>> line:
>>
>> $ srilm/bin/i686-m64/ngram-count -order $(LM_ORDER) -interpolate
>> -kndiscount -text text.txt -lm langmodel.lm
>>
>> I do not know why when scoring the same sentence I get different scores.
>> In this regard I have a few questions:
>> * Does SRILM introduces begin-of-sentence and end-of-sentence tokens
>> during training?
>> * and, during scoring (or decoding)?
>> * Does IRSTLM introduces begin-of-sentence and end-of-sentence tokens
>> during scoring (or decoding)?
>> * I know SRILM uses log base 10. Does IRSTLM also use log base 10? (It
>> seems so)
>>
>> When I score the English sentence "the fifth committee resumed its
>> consideration of the item at its 64th and 74th meetings , on 15 may and
>> 2 june 2000 ." the score (log prob) I get are:
>> SRILM: -54.4623
>> IRSTLM: -49.9141
>>
>> if I introduce<s> and</s> when scoring with IRSTLM I get a log prob of
>> -55.3099 (very similar to that of SRILM).
>>
>> The code to score with IRSTLM was borrowed from Moses.
>>
>> Than you very much for your help.
>>
>> Regards.
>> --
>> Felipe Sánchez Martínez
>> Departamento de Lenguajes y Sistemas Informáticos
>> Universidad de Alicante, E-03071 Alicante (Spain)
>> Tel.: +34 965 903 400, ext: 2966 Fax: +34 965 909 326
>> http://www.dlsi.ua.es/~fsanchez
>> _______________________________________________
>> Moses-support mailing list
>> [email protected]
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
--
Felipe Sánchez Martínez
Departamento de Lenguajes y Sistemas Informáticos
Universidad de Alicante, E-03071 Alicante (Spain)
Tel.: +34 965 903 400, ext: 2966 Fax: +34 965 909 326
http://www.dlsi.ua.es/~fsanchez
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support