Re: [Moses-support] how to integrate a long-span LM into moses

Kenneth Heafield Thu, 03 Apr 2014 14:27:25 -0700


On 04/03/14 03:55, David Mrva wrote:
> Hi moses developers,
> 
> I have integrated an LM with unlimited history into moses code by 
> implementing the StatefulFeatureFunction class following the example in 
> moses/FF/SkeletonStatefulFF.cpp. My LM code can also handle ngrams 
> through using kenlm. As kenlm is integrated into moses, I compared my 
> new code with the standard moses with cs-en models. Standard moses gave 
> 17.12 BLEU points and my new code with the same ini file and the same LM 
> file resulted in BLEU 13.4. This lead me to read the Kenlm.cpp 
> implementation in much more detail and I found a number of interesting 
> aspects that I would like to understand to integrate a long-span model 
> correctly.
> 
> 1/ In the Evaluate(const Hypothesis &hypo, const FFState *ps, 
> ScoreComponentCollection *out) const method moses/LM/Kenlm.cpp 
> calculates the score over only the first N - 1 words, where N is the 
> ngram order. Why not calculate the score for all words in the target 
> phrase from hypo.GetCurrTargetWordsRange().GetEndPos() to 
> hypo.GetCurrTargetWordsRange().GetStartPos()? Is this an optimisation or 
> is it required to make the translation work?


It's an optimization.  The Nth and beyond words were already scored when
the phrase was loaded so why bother scoring them again?

> 
> 2/ At the time of the load of the translation phrase table, moses calls:
> void LanguageModel::Evaluate(const Phrase &source, const TargetPhrase 
> &targetPhrase, ScoreComponentCollection &scoreBreakdown, 
> ScoreComponentCollection &estimatedFutureScore) const
> where LanguageModel is the ancestor of the LanguageModelKen<Model> 
> template implemented in Kenlm.cpp. Moses calls this Evaluate() method 
> for each source-target translation phrase and assigns the accumulated 
> score of the first N-1 words to estimatedFutureScore and the accumulated 
> score over the rest of the phrase into scoreBreakdown. What is the 
> purpose of this split of the ngram scores into the first N-1 words and 
> the rest of the phrase? Is the scoreBreakdown value added with the score 
> calculated with the Evaluate() method from point 1/ during the search to 
> get the total phrase score?

The future part, which includes everything, is used for cube pruning
prioritization and future cost estimates.  The Nth and beyond score is
added to the score of the first N-1 words under point 1.

> 
> 3/ The method LanguageModelKen<Model>::CalcScore(const Phrase &phrase, 
> float &fullScore, float &ngramScore, size_t &oovCount) const used also 
> in the calculation in point 2/ above, distinguishes between 
> terminal/non-terminal words. What are these? Is the distinction relevant 
> to long-span LMs? Why is the terminal/non-terminal distinction not 
> necessary in the Evaluate() method described in point 1/ above?

Evaluate is only used by phrase-based MT.  CalcScore is used by
syntactic and phrase-based MT.  Syntactic MT can have non-terminals.

> 
> After changing my implementation to incorporate the behaviour described 
> above I got an exact match of the output with moses's "native" Kenlm 
> implementation. However, the behaviour in point 1/ is not suitable for 
> long-span LMs and point 2/ does not make sense for non-ngram models. I 
> would expect my long-span LM to operate in such a way that an LM state 
> assigned to a hypothesis covers all target words from the start to the 
> end of the hypothesis. And when a hypothesis is being extended, its LM 
> state is extended by one target word at a time in a loop over the new 
> phrase from start to finish. Ngram LM implementation does not work in 
> this way and it seems to harm ngram performance. Can anyone shed some 
> light on the motivation behind the behaviour described above in points 1-3?

If you want good search accuracy with your long-distance LM then you
need the ability to estimate the scores of phrases without knowing what
will appear before them.  These estimates are critical to driving
search.  It is likely that, once you implement such estimates, they will
take a similar form the the n-gram implementations.

> 
> I used moses with its default, a.k.a. "normal", search algorithm (no 
> [search-algorithm] variable specified in my config). For completeness, 
> my config when using moses with its Kenlm class is pasted below.
> 
> Best regards,
> David
> 
> 
> # input factors
> [input-factors]
> 0
> 
> # mapping steps
> [mapping]
> 0 T 0
> 
> [distortion-limit]
> 6
> 
> # feature functions
> [feature]
> UnknownWordPenalty
> WordPenalty
> PhrasePenalty
> PhraseDictionaryMemory name=TranslationModel0 table-limit=20 
> num-features=4 path=model/phrase-table.1.gz input-factor=0 output-factor=0
> LexicalReordering name=LexicalReordering0 num-features=6 
> type=wbe-msd-bidirectional-fe-allff input-factor=0 output-factor=0 
> path=model/reordering-table.1.wbe-msd-bidirectional-fe.gz
> Distortion
> KENLM lazyken=1 name=LM0 factor=0 path=lm/europarl.binlm.1 order=5
> 
> # dense weights for feature functions
> [weight]
> UnknownWordPenalty0= 1
> WordPenalty0= -1
> PhrasePenalty0= 0.2
> TranslationModel0= 0.2 0.2 0.2 0.2
> LexicalReordering0= 0.3 0.3 0.3 0.3 0.3 0.3
> Distortion0= 0.3
> LM0= 0.5
> 
> 
> 
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
> 
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] how to integrate a long-span LM into moses

Reply via email to