Marc makes a good point. When one language model provides more information than do other language models, it's difficult to maintain a common abstraction layer. Currently we're looking at n-gram length. SRILM doesn't provide access to that (but you can get right-looking state length which is usually the same thing).
I'm working on making this issue more severe with left-looking state optimization and explicit hypothesis bounds. How do we change the decoder to use these features if not all of the language models support them? Maybe another class in the language model hierarchy supporting these additional features. But it's going to make the decoder look ugly if you want to support both. On 07/21/11 11:14, Hieu Hoang wrote: > hi marc, > > it'll be good for people to see your changes. > > i suppose you should create a branch and make your changes in there. > > If there are other people interested, you can point them to your branch. > If more people are interested and it doesn't affect other people too > much, then we can move it to trunk. > > i'll email you offline with svn details > > On 21/07/2011 15:16, Marc LEGENDRE wrote: >> Alright, I gave this a try, and it did it for me. >> With kenlm, it is a ridiculously straightforward modification, >> but now I'm not sure how I can submit it : >> on one hand, I am not a "machine tranlation guy" and I don't imagine myself >> digging in every other LM to find how to set the ngram_length value; >> and on the other hand I would feel guilty to submit a 10-line patch and say >> "Guys, I need this, would you mind committing it and doing yourselves the >> necessary modifications in every other wrapper ?" >> >> How do you, Moses developers, feel about this ? >> Is it acceptable / outrageously stupid if I set the value to -1 in the other >> wrappers, >> maybe with a TODO, and properly document it in the super class ? >> >> ----- Mail original ----- >>> De: "Kenneth Heafield"<[email protected]> >>> À: [email protected] >>> Envoyé: Mercredi 13 Juillet 2011 20:53:46 >>> Objet: Re: [Moses-support] Using Moses language models >>> >>> I'd suggest adding a ngram_length member to LMResult then modifying >>> each >>> model's wrapper (or just mine) to set that value. >>> >>> You're welcome to move stuff from LanguageModelKen.cpp to >>> LanguageModelKen.h as necessary. I chose this setup to minimize >>> unnecessary includes. >>> >>> Kenneth >>> >>> On 07/13/11 14:33, Marc LEGENDRE wrote: >>>> Well, not only the header is not "public", so to speak, (which I >>>> agree is not a major obstacle) >>>> but also the desired pointer is a private member of the class, and >>>> sadly lacks a getter. >>>> As far as I know, it means that accessing it will involve >>>> questionnable C++ tricks. >>>> (never tried, though) >>>> >>>> If modifying Moses is not too much of a chore, I'll give it a >>>> thought. >>>> >>>> Anyway, thank you for your answers. >>>> >>>> ----- Mail original ----- >>>>> De: "Hieu Hoang"<[email protected]> >>>>> À: [email protected] >>>>> Envoyé: Mercredi 13 Juillet 2011 18:40:11 >>>>> Objet: Re: [Moses-support] Using Moses language models >>>>> i guess lm::Model is specific to the ken lm implementation. If you >>>>> want >>>>> use it you should include the header yourself and cast whatever >>>>> you >>>>> need >>>>> to get the pointer. >>>>> >>>>> if you're feeling generous, maybe you can extend the moses LM >>>>> wrapper >>>>> so >>>>> that all LM implementations have the opportunity to return the >>>>> length >>>>> n-gram match. >>>>> >>>>> On 13/07/2011 21:51, Marc LEGENDRE wrote: >>>>>> The length of the n-gram match is sufficient for I want, indeed. >>>>>> I figured out how to do get it using directly kenlm, but as I am >>>>>> running the decoder, I wanted to use the already loaded LM. >>>>>> >>>>>> I first tried to dig my way through the Moses abstraction layers >>>>>> to >>>>>> retrieve a pointer to a lm::Model from kenlm, but the >>>>>> Moses::LanguageModelKen header is not part of the public headers >>>>>> of >>>>>> Moses ; that's why I tried to use only Moses interface. >>>>>> >>>>>> (I did I did not mention this alternative ; If someone knows how >>>>>> to >>>>>> get such a pointer, I can carry on from there) >>>>>> >>>>>> >>>>>> ----- Mail original ----- >>>>>>> De: "Kenneth Heafield"<[email protected]> >>>>>>> À: "Marc LEGENDRE"<[email protected]> >>>>>>> Envoyé: Mercredi 13 Juillet 2011 16:12:27 >>>>>>> Objet: Re: [Moses-support] Using Moses language models >>>>>>> The definition of unknown is that the word you asked for (the >>>>>>> rightmost >>>>>>> one) is mapped to<unk> i.e. an OOV. >>>>>>> >>>>>>> Are you looking for: >>>>>>> >>>>>>> 1) Length of n-gram matched in the model >>>>>>> >>>>>>> or >>>>>>> >>>>>>> 2) Length of state you must keep for valid continuation to the >>>>>>> right >>>>>>> >>>>>>> These are slightly different things due to state minimization. >>>>>>> The >>>>>>> moses abstraction layer does not return either in a general way. >>>>>>> However, if you're using KenLM, #2 is in the returned state's >>>>>>> valid_length_. Further, #1 is in FullScoreReturn.ngram_length. >>>>>>> So >>>>>>> if >>>>>>> you call KenLM directly these are easy to obtain (and you can >>>>>>> decide >>>>>>> whether to expose them through the Moses abstraction layer). >>>>>>> >>>>>>> Outside the decoder, you can run >>>>>>> >>>>>>> kenlm/query model_file null >>>>>>> >>>>>>> then provide your trigrams on stdin. >>>>>>> >>>>>>> Here's an example with kenlm/query kenlm/lm/test.arpa null >>>>>>> >>>>>>> looking on a >>>>>>> looking=23 1 -1.28594 on=25 2 -0.46389 a=5 3 -0.0483513 >>>>>>> Total: -1.79818 OOV: 0 >>>>>>> >>>>>>> The format is "word=vocab_id ngram_length score". So this is a >>>>>>> trigram >>>>>>> in the model because "a=5 3" appears. >>>>>>> >>>>>>> On 07/13/11 08:50, Marc LEGENDRE wrote: >>>>>>>> Hello, >>>>>>>> >>>>>>>> I am trying to use the language models loaded by Moses ; >>>>>>>> >>>>>>>> I am using a 3-gram LM, and I need to know whether it contains >>>>>>>> a >>>>>>>> given N-gram or not. >>>>>>>> I tried to play around with >>>>>>>> LanguageModelImplementation::GetValueForgotState(...), >>>>>>>> but the boolean 'unknown' in the returned structure does not >>>>>>>> seem >>>>>>>> to >>>>>>>> be what I'm looking for. >>>>>>>> >>>>>>>> Is there any simple way of getting this piece of information ? >>>>>>>> >>>>>>>> >>>>>>>> Regards, >>>>>>>> Marc Legendre >>>>>>>> _______________________________________________ >>>>>>>> Moses-support mailing list >>>>>>>> [email protected] >>>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support >>>>>> _______________________________________________ >>>>>> Moses-support mailing list >>>>>> [email protected] >>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support >>>>>> >>>>>> >>>>> _______________________________________________ >>>>> Moses-support mailing list >>>>> [email protected] >>>>> http://mailman.mit.edu/mailman/listinfo/moses-support >>>> _______________________________________________ >>>> Moses-support mailing list >>>> [email protected] >>>> http://mailman.mit.edu/mailman/listinfo/moses-support >>> _______________________________________________ >>> Moses-support mailing list >>> [email protected] >>> http://mailman.mit.edu/mailman/listinfo/moses-support >>> >> _______________________________________________ >> Moses-support mailing list >> [email protected] >> http://mailman.mit.edu/mailman/listinfo/moses-support >> >> > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
