Admitted, the ngram length (capped to N-1) gets you most of the way there and I already provide that.
But you're asking for a third piece of information. If you query for "foo bar baz" and I can tell you that it will never extend to "* foo bar baz" for any word * (due to pruning or filtering), then you need only remember "foo bar" (or even less). The trie knows this but because the pointers are equal but it currently isn't telling you. Probing could tell you this if I used the otherwise-unused probability sign bit to encode it. It will come soon. But I've been working on saving memory and making a pretty presentation for WMT. Kenneth On 07/13/11 14:54, Philipp Koehn wrote: > Hi, > > just to comment that it would be indeed very useful to know > at what order a n-gram lookup has been resolved. One application > for this would be improved recombination for the left context of phrases > in chart decoder hypotheses. > > -phi > > On Wed, Jul 13, 2011 at 7:33 PM, Marc LEGENDRE > <[email protected] > <mailto:[email protected]>> wrote: > > Well, not only the header is not "public", so to speak, (which I > agree is not a major obstacle) > but also the desired pointer is a private member of the class, and > sadly lacks a getter. > As far as I know, it means that accessing it will involve > questionnable C++ tricks. > (never tried, though) > > If modifying Moses is not too much of a chore, I'll give it a thought. > > Anyway, thank you for your answers. > > ----- Mail original ----- > > De: "Hieu Hoang" <[email protected] <mailto:[email protected]>> > > À: [email protected] <mailto:[email protected]> > > Envoyé: Mercredi 13 Juillet 2011 18:40:11 > > Objet: Re: [Moses-support] Using Moses language models > > i guess lm::Model is specific to the ken lm implementation. If you > > want > > use it you should include the header yourself and cast whatever you > > need > > to get the pointer. > > > > if you're feeling generous, maybe you can extend the moses LM wrapper > > so > > that all LM implementations have the opportunity to return the length > > n-gram match. > > > > On 13/07/2011 21:51, Marc LEGENDRE wrote: > > > The length of the n-gram match is sufficient for I want, indeed. > > > I figured out how to do get it using directly kenlm, but as I am > > > running the decoder, I wanted to use the already loaded LM. > > > > > > I first tried to dig my way through the Moses abstraction layers to > > > retrieve a pointer to a lm::Model from kenlm, but the > > > Moses::LanguageModelKen header is not part of the public headers of > > > Moses ; that's why I tried to use only Moses interface. > > > > > > (I did I did not mention this alternative ; If someone knows how to > > > get such a pointer, I can carry on from there) > > > > > > > > > ----- Mail original ----- > > >> De: "Kenneth Heafield"<[email protected] > <mailto:[email protected]>> > > >> À: "Marc LEGENDRE"<[email protected] > <mailto:[email protected]>> > > >> Envoyé: Mercredi 13 Juillet 2011 16:12:27 > > >> Objet: Re: [Moses-support] Using Moses language models > > >> The definition of unknown is that the word you asked for (the > > >> rightmost > > >> one) is mapped to<unk> i.e. an OOV. > > >> > > >> Are you looking for: > > >> > > >> 1) Length of n-gram matched in the model > > >> > > >> or > > >> > > >> 2) Length of state you must keep for valid continuation to the > > >> right > > >> > > >> These are slightly different things due to state minimization. The > > >> moses abstraction layer does not return either in a general way. > > >> However, if you're using KenLM, #2 is in the returned state's > > >> valid_length_. Further, #1 is in FullScoreReturn.ngram_length. So > > >> if > > >> you call KenLM directly these are easy to obtain (and you can > > >> decide > > >> whether to expose them through the Moses abstraction layer). > > >> > > >> Outside the decoder, you can run > > >> > > >> kenlm/query model_file null > > >> > > >> then provide your trigrams on stdin. > > >> > > >> Here's an example with kenlm/query kenlm/lm/test.arpa null > > >> > > >> looking on a > > >> looking=23 1 -1.28594 on=25 2 -0.46389 a=5 3 -0.0483513 > > >> Total: -1.79818 OOV: 0 > > >> > > >> The format is "word=vocab_id ngram_length score". So this is a > > >> trigram > > >> in the model because "a=5 3" appears. > > >> > > >> On 07/13/11 08:50, Marc LEGENDRE wrote: > > >>> Hello, > > >>> > > >>> I am trying to use the language models loaded by Moses ; > > >>> > > >>> I am using a 3-gram LM, and I need to know whether it contains a > > >>> given N-gram or not. > > >>> I tried to play around with > > >>> LanguageModelImplementation::GetValueForgotState(...), > > >>> but the boolean 'unknown' in the returned structure does not seem > > >>> to > > >>> be what I'm looking for. > > >>> > > >>> Is there any simple way of getting this piece of information ? > > >>> > > >>> > > >>> Regards, > > >>> Marc Legendre > > >>> _______________________________________________ > > >>> Moses-support mailing list > > >>> [email protected] <mailto:[email protected]> > > >>> http://mailman.mit.edu/mailman/listinfo/moses-support > > > _______________________________________________ > > > Moses-support mailing list > > > [email protected] <mailto:[email protected]> > > > http://mailman.mit.edu/mailman/listinfo/moses-support > > > > > > > > _______________________________________________ > > Moses-support mailing list > > [email protected] <mailto:[email protected]> > > http://mailman.mit.edu/mailman/listinfo/moses-support > > _______________________________________________ > Moses-support mailing list > [email protected] <mailto:[email protected]> > http://mailman.mit.edu/mailman/listinfo/moses-support > > > > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
