Hi, But you're asking for a third piece of information. If you query for > "foo bar baz" and I can tell you that it will never extend to "* foo bar > baz" for any word * (due to pruning or filtering), then you need only > remember "foo bar" (or even less). The trie knows this but because the > pointers are equal but it currently isn't telling you. Probing could > tell you this if I used the otherwise-unused probability sign bit to > encode it. >
The thinking is here, given a prefix of "A B C D" and a language model of order 5, then we can ignore D if the ngram "A B C" is unknown. Why? Because if "A B C" is unknown, then also any "* A B C" will be unknown, assuming sane low-count pruning. So, there will always be free back-off to the lower order n-gram. Knowing that there is no "* A B C D" is the language model may not be helpful, since different "* A B C" have different backoff costs. -phi
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
