Alright, I gave this a try, and it did it for me. With kenlm, it is a ridiculously straightforward modification, but now I'm not sure how I can submit it : on one hand, I am not a "machine tranlation guy" and I don't imagine myself digging in every other LM to find how to set the ngram_length value; and on the other hand I would feel guilty to submit a 10-line patch and say "Guys, I need this, would you mind committing it and doing yourselves the necessary modifications in every other wrapper ?"
How do you, Moses developers, feel about this ? Is it acceptable / outrageously stupid if I set the value to -1 in the other wrappers, maybe with a TODO, and properly document it in the super class ? ----- Mail original ----- > De: "Kenneth Heafield" <[email protected]> > À: [email protected] > Envoyé: Mercredi 13 Juillet 2011 20:53:46 > Objet: Re: [Moses-support] Using Moses language models > > I'd suggest adding a ngram_length member to LMResult then modifying > each > model's wrapper (or just mine) to set that value. > > You're welcome to move stuff from LanguageModelKen.cpp to > LanguageModelKen.h as necessary. I chose this setup to minimize > unnecessary includes. > > Kenneth > > On 07/13/11 14:33, Marc LEGENDRE wrote: > > Well, not only the header is not "public", so to speak, (which I > > agree is not a major obstacle) > > but also the desired pointer is a private member of the class, and > > sadly lacks a getter. > > As far as I know, it means that accessing it will involve > > questionnable C++ tricks. > > (never tried, though) > > > > If modifying Moses is not too much of a chore, I'll give it a > > thought. > > > > Anyway, thank you for your answers. > > > > ----- Mail original ----- > >> De: "Hieu Hoang" <[email protected]> > >> À: [email protected] > >> Envoyé: Mercredi 13 Juillet 2011 18:40:11 > >> Objet: Re: [Moses-support] Using Moses language models > >> i guess lm::Model is specific to the ken lm implementation. If you > >> want > >> use it you should include the header yourself and cast whatever > >> you > >> need > >> to get the pointer. > >> > >> if you're feeling generous, maybe you can extend the moses LM > >> wrapper > >> so > >> that all LM implementations have the opportunity to return the > >> length > >> n-gram match. > >> > >> On 13/07/2011 21:51, Marc LEGENDRE wrote: > >>> The length of the n-gram match is sufficient for I want, indeed. > >>> I figured out how to do get it using directly kenlm, but as I am > >>> running the decoder, I wanted to use the already loaded LM. > >>> > >>> I first tried to dig my way through the Moses abstraction layers > >>> to > >>> retrieve a pointer to a lm::Model from kenlm, but the > >>> Moses::LanguageModelKen header is not part of the public headers > >>> of > >>> Moses ; that's why I tried to use only Moses interface. > >>> > >>> (I did I did not mention this alternative ; If someone knows how > >>> to > >>> get such a pointer, I can carry on from there) > >>> > >>> > >>> ----- Mail original ----- > >>>> De: "Kenneth Heafield"<[email protected]> > >>>> À: "Marc LEGENDRE"<[email protected]> > >>>> Envoyé: Mercredi 13 Juillet 2011 16:12:27 > >>>> Objet: Re: [Moses-support] Using Moses language models > >>>> The definition of unknown is that the word you asked for (the > >>>> rightmost > >>>> one) is mapped to<unk> i.e. an OOV. > >>>> > >>>> Are you looking for: > >>>> > >>>> 1) Length of n-gram matched in the model > >>>> > >>>> or > >>>> > >>>> 2) Length of state you must keep for valid continuation to the > >>>> right > >>>> > >>>> These are slightly different things due to state minimization. > >>>> The > >>>> moses abstraction layer does not return either in a general way. > >>>> However, if you're using KenLM, #2 is in the returned state's > >>>> valid_length_. Further, #1 is in FullScoreReturn.ngram_length. > >>>> So > >>>> if > >>>> you call KenLM directly these are easy to obtain (and you can > >>>> decide > >>>> whether to expose them through the Moses abstraction layer). > >>>> > >>>> Outside the decoder, you can run > >>>> > >>>> kenlm/query model_file null > >>>> > >>>> then provide your trigrams on stdin. > >>>> > >>>> Here's an example with kenlm/query kenlm/lm/test.arpa null > >>>> > >>>> looking on a > >>>> looking=23 1 -1.28594 on=25 2 -0.46389 a=5 3 -0.0483513 > >>>> Total: -1.79818 OOV: 0 > >>>> > >>>> The format is "word=vocab_id ngram_length score". So this is a > >>>> trigram > >>>> in the model because "a=5 3" appears. > >>>> > >>>> On 07/13/11 08:50, Marc LEGENDRE wrote: > >>>>> Hello, > >>>>> > >>>>> I am trying to use the language models loaded by Moses ; > >>>>> > >>>>> I am using a 3-gram LM, and I need to know whether it contains > >>>>> a > >>>>> given N-gram or not. > >>>>> I tried to play around with > >>>>> LanguageModelImplementation::GetValueForgotState(...), > >>>>> but the boolean 'unknown' in the returned structure does not > >>>>> seem > >>>>> to > >>>>> be what I'm looking for. > >>>>> > >>>>> Is there any simple way of getting this piece of information ? > >>>>> > >>>>> > >>>>> Regards, > >>>>> Marc Legendre > >>>>> _______________________________________________ > >>>>> Moses-support mailing list > >>>>> [email protected] > >>>>> http://mailman.mit.edu/mailman/listinfo/moses-support > >>> _______________________________________________ > >>> Moses-support mailing list > >>> [email protected] > >>> http://mailman.mit.edu/mailman/listinfo/moses-support > >>> > >>> > >> _______________________________________________ > >> Moses-support mailing list > >> [email protected] > >> http://mailman.mit.edu/mailman/listinfo/moses-support > > > > _______________________________________________ > > Moses-support mailing list > > [email protected] > > http://mailman.mit.edu/mailman/listinfo/moses-support > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support > _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
