Re: [Moses-support] question about recombination when trying to output phrase lattices

Nicola Bertoldi Fri, 05 Feb 2010 05:01:48 -0800

Dear Kevin,

the recombination takes into account the history that the LM ACTUALLY uses for 
the computation of the prob.


It can happen that the LM does not know any 4-grams with history ", beer a" or 
"<s> beer a",
hence the LM makes always a backoff on the common history "beer a".

Could you check please if your LM has shown any 4-grams with history ", beer a" 
or "<s> beer a"?
Which type of LM are you using (SRI or IRST)?

Nicola


On 2/5/10 1:47 PM, "Kevin Gimpel" <[email protected]> wrote:

Hey all,

I'm trying to construct a phrase lattice as output from Moses.  I have been 
playing around with "-output-search-graph" and "-verbose 3" and have become 
confused about recombination and how it preserves language model states.

For example, if I translate "hier ist ein bier" from German to English and use 
a 4-gram language model, I see the following lines as part of the output when 
using -output-search-graph:

...
0 hyp=17 stack=1 back=0 score=-5.57705 transition=-5.57705 forward=35 
fscore=-10.8062 covered=3-3 out=beer , pC=0.131725, c=-2.6988
0 hyp=18 stack=1 back=0 score=-8.39914 transition=-8.39914 forward=50 
fscore=-11.1884 covered=3-3 out=, beer , pC=-1.81449, c=-5.14484
...
0 hyp=47 stack=2 back=17 score=-11.4177 transition=-5.84061 forward=173 
fscore=-12.6408 covered=2-2 out=a , pC=-0.318772, c=-1.8764
0 hyp=62 stack=2 back=18 score=-13.6186 transition=-5.2195 recombined=47 
forward=173 fscore=-12.6408 covered=2-2 out=a , pC=-0.318772, c=-1.8764

I am surprised that recombination occurs in the last line shown, because 
hypothesis 62 ends in ", beer a" while hypothesis 47 ends in "<s> beer a" -- 
causing future hypotheses that come from 47 or 62 to have different 4-gram 
language model probabilities.  I had been thinking that recombination was a 
risk-free pruning method of the search space as described in the Moses 
background page / original Pharaoh paper 
(http://www.statmt.org/moses/?n=Moses.Background), but maybe my assumption is 
obsolete.

I can see a couple possibilities here:
1. Moses checks all necessary LM probabilities for the given trailing trigrams 
in each hypothesis and determines that the recombination can take place safely 
(e.g., no possible phrases following ", beer a" would give lower cost than "<s> 
beer a").  This is indeed a risk-free strategy.
2. Moses only checks the trailing words in the _current_ "hypotheses" when 
deciding to recombine and doesn't look at previous hypotheses. So, 62 would 
recombine with 47 because they both end with "a", regardless of what 17 and 18 
end in.
3. Moses only checks at most the last two words in each hypothesis when trying 
to recombine, regardless of what order of language model is used.
4. Something else?

Thanks!
Kevin



_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] question about recombination when trying to output phrase lattices

Reply via email to