Hi all,
It would be great if the experts (who are familiar with the decoder, and
the srilm integration) could have a look at what I have added in below as
comments to the GetValue function of SRI.cpp, and answer the left questions
(in bold fonts) and add possible corrections. I think this will make things
lot more clear for everyone who is planning to do the integration. So I
really appreciate it if you go over this, spend a few minutes and provide
me your precise feedback. please read all the comments to make sure we are
on the same page.
Thanks, -K.
>From SRI.cpp:
//K: contextFactor is the sequence of words we want to assign a probability
to. Example: "The dog chases"
//K: what is *finalState when "The dog chases" is sent to this function?
LMResult LanguageModelSRI::GetValue(const vector<const Word*>
&contextFactor, State* finalState) const
{
...
//K: count is the size of the sequence. In this example, count is 3
size_t count = contextFactor.size();
...
//K: ngram is an empty array of size 4 in this example
VocabIndex ngram[count + 1];
//K: fills ngram array using "part" of the contextFactor
for (size_t i = 0 ; i < count - 1 ; i++) {
ngram[i+1] = GetLmID((*contextFactor[count-2-i])[factorType]);
}
//K: break-down of the loop:
ngram[0+1] = contextFactor[3-2-0] = dog
ngram[1+1] = contextFactor[3-2-1] = the
//K: the ngram array after the for-loop is: [ , dog, the, ]
//K: very weird ordering for ngram.
* //K: what is the use of Vocab_None?*
ngram[count] = Vocab_None;
//K: ngram after the above steps is: [ , dog, the, Vocab_None ]
...
//K: lmId contains the id of the last word of the sequence which in this
case is the id for "chases"
VocabIndex lmId = GetLmID((*contextFactor[count-1])[factorType]);
//K: getting the probability of the last word "chases", given the context
stored in ngram array [ , dog, the, Vocab_None ]
//K: the ngram+1 is to ignore the empty cell in the beginning of the
ngram array.
ret = GetValue(lmId, ngram+1);
//K: if finalState is not zero.
*//K: What does it mean for finalState not to be zero?*
if (finalState) {
//K: Now the first empty cell of the ngram array gets filled with
"chases"
ngram[0] = lmId;
//K: ngram array is now [chases, dog, the, Vocab_None]
* //K: what is this?*
unsigned int dummy;
//K: an id for the full sequence "the dog chases" is being returned by
srilm.
//K: Id returned by srilm and finalState seems to be the same thing.
*finalState = m_srilmModel->contextID(ngram, dummy);
}
//K: So the function takes a sequence, and returns a score for the
sequence and updates the finalState
//K: now if we call this function again with "The dog chases her", the
finalState basically holds the id for "The dog chases".
*//K: Is this correct?*
return ret;
}
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support