Hi moses developers, I have integrated an LM with unlimited history into moses code by implementing the StatefulFeatureFunction class following the example in moses/FF/SkeletonStatefulFF.cpp. My LM code can also handle ngrams through using kenlm. As kenlm is integrated into moses, I compared my new code with the standard moses with cs-en models. Standard moses gave 17.12 BLEU points and my new code with the same ini file and the same LM file resulted in BLEU 13.4. This lead me to read the Kenlm.cpp implementation in much more detail and I found a number of interesting aspects that I would like to understand to integrate a long-span model correctly.
1/ In the Evaluate(const Hypothesis &hypo, const FFState *ps, ScoreComponentCollection *out) const method moses/LM/Kenlm.cpp calculates the score over only the first N - 1 words, where N is the ngram order. Why not calculate the score for all words in the target phrase from hypo.GetCurrTargetWordsRange().GetEndPos() to hypo.GetCurrTargetWordsRange().GetStartPos()? Is this an optimisation or is it required to make the translation work? 2/ At the time of the load of the translation phrase table, moses calls: void LanguageModel::Evaluate(const Phrase &source, const TargetPhrase &targetPhrase, ScoreComponentCollection &scoreBreakdown, ScoreComponentCollection &estimatedFutureScore) const where LanguageModel is the ancestor of the LanguageModelKen<Model> template implemented in Kenlm.cpp. Moses calls this Evaluate() method for each source-target translation phrase and assigns the accumulated score of the first N-1 words to estimatedFutureScore and the accumulated score over the rest of the phrase into scoreBreakdown. What is the purpose of this split of the ngram scores into the first N-1 words and the rest of the phrase? Is the scoreBreakdown value added with the score calculated with the Evaluate() method from point 1/ during the search to get the total phrase score? 3/ The method LanguageModelKen<Model>::CalcScore(const Phrase &phrase, float &fullScore, float &ngramScore, size_t &oovCount) const used also in the calculation in point 2/ above, distinguishes between terminal/non-terminal words. What are these? Is the distinction relevant to long-span LMs? Why is the terminal/non-terminal distinction not necessary in the Evaluate() method described in point 1/ above? After changing my implementation to incorporate the behaviour described above I got an exact match of the output with moses's "native" Kenlm implementation. However, the behaviour in point 1/ is not suitable for long-span LMs and point 2/ does not make sense for non-ngram models. I would expect my long-span LM to operate in such a way that an LM state assigned to a hypothesis covers all target words from the start to the end of the hypothesis. And when a hypothesis is being extended, its LM state is extended by one target word at a time in a loop over the new phrase from start to finish. Ngram LM implementation does not work in this way and it seems to harm ngram performance. Can anyone shed some light on the motivation behind the behaviour described above in points 1-3? I used moses with its default, a.k.a. "normal", search algorithm (no [search-algorithm] variable specified in my config). For completeness, my config when using moses with its Kenlm class is pasted below. Best regards, David # input factors [input-factors] 0 # mapping steps [mapping] 0 T 0 [distortion-limit] 6 # feature functions [feature] UnknownWordPenalty WordPenalty PhrasePenalty PhraseDictionaryMemory name=TranslationModel0 table-limit=20 num-features=4 path=model/phrase-table.1.gz input-factor=0 output-factor=0 LexicalReordering name=LexicalReordering0 num-features=6 type=wbe-msd-bidirectional-fe-allff input-factor=0 output-factor=0 path=model/reordering-table.1.wbe-msd-bidirectional-fe.gz Distortion KENLM lazyken=1 name=LM0 factor=0 path=lm/europarl.binlm.1 order=5 # dense weights for feature functions [weight] UnknownWordPenalty0= 1 WordPenalty0= -1 PhrasePenalty0= 0.2 TranslationModel0= 0.2 0.2 0.2 0.2 LexicalReordering0= 0.3 0.3 0.3 0.3 0.3 0.3 Distortion0= 0.3 LM0= 0.5 _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
