[Moses-support] Same story again: lm integration into moses (hopefully the last time)

koormoosh Thu, 10 Dec 2015 04:18:14 -0800

Hi all,

It would be great if the experts (who are familiar with the decoder, and
the srilm integration) could have a look at what I have added in below as
comments to the GetValue function of SRI.cpp, and answer the left questions
(in bold fonts) and add possible corrections. I think this will make things
lot more clear for everyone who is planning to do the integration. So I
really appreciate it if you go over this, spend a few minutes and provide
me your precise feedback. please read all the comments to make sure we are
on the same page.


Thanks, -K.

>From SRI.cpp:

//K: contextFactor is the sequence of words we want to assign a probability
to. Example:  "The dog chases"
//K: what is *finalState when "The dog chases" is sent to this function?
LMResult LanguageModelSRI::GetValue(const vector<const Word*>
&contextFactor, State* finalState) const
{
  ...

  //K: count is the size of the sequence. In this example, count is 3
  size_t count = contextFactor.size();
  ...

  //K: ngram is an empty array of size 4 in this example
  VocabIndex ngram[count + 1];

  //K: fills ngram array using "part" of the contextFactor
  for (size_t i = 0 ; i < count - 1 ; i++) {
    ngram[i+1] =  GetLmID((*contextFactor[count-2-i])[factorType]);
  }
  //K: break-down of the loop:
       ngram[0+1] = contextFactor[3-2-0] = dog
       ngram[1+1] = contextFactor[3-2-1] = the
  //K: the ngram array after the for-loop is: [ , dog, the, ]
  //K: very weird ordering for ngram.


 * //K: what is the use of Vocab_None?*
  ngram[count] = Vocab_None;
  //K: ngram after the above steps is: [ , dog, the, Vocab_None ]

  ...

  //K: lmId contains the id of the last word of the sequence which in this
case is the id for "chases"
  VocabIndex lmId = GetLmID((*contextFactor[count-1])[factorType]);

  //K: getting the probability of the last word "chases", given the context
stored in ngram array [ , dog, the, Vocab_None ]
  //K: the ngram+1 is to ignore the empty cell in the beginning of the
ngram array.
  ret = GetValue(lmId, ngram+1);

  //K: if finalState is not zero.
  *//K: What does it mean for finalState not to be zero?*
  if (finalState) {
    //K: Now the first empty cell of the ngram array gets filled with
"chases"
    ngram[0] = lmId;
    //K: ngram array is now [chases, dog, the, Vocab_None]

   * //K: what is this?*
    unsigned int dummy;

    //K: an id for the full sequence "the dog chases" is being returned by
srilm.
    //K: Id returned by srilm and finalState seems to be the same thing.
    *finalState = m_srilmModel->contextID(ngram, dummy);
  }

  //K: So the function takes a sequence, and returns a score for the
sequence and updates the finalState
  //K: now if we call this function again with "The dog chases her", the
finalState basically holds the id for "The dog chases".
  *//K: Is this correct?*

  return ret;
}

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

[Moses-support] Same story again: lm integration into moses (hopefully the last time)

Reply via email to