I support depending on Boost but sadly some people don't. PhraseDictionaryTree.cpp:3 in your branch includes a boost header.
Kenneth On 08/24/11 10:17, Marc LEGENDRE wrote: > Hi, > > I merged the trunk into my branch; it looks ok. > May my little modification to LMKen.h/cpp be finally merged into the trunk ? > (not the useless changes to PhraseDictionaryTree) > > Thanks, (And sorry for my low reactivity, I hope you remember me!) > > Marc > > ----- Mail original ----- >> De: "Hieu Hoang" <[email protected]> >> À: "Marc LEGENDRE" <[email protected]> >> Cc: "Kenneth Heafield" <[email protected]>, [email protected] >> Envoyé: Mercredi 27 Juillet 2011 13:34:35 >> Objet: Re: [Moses-support] Using Moses language models >> >> hi marc, >> >> thx for the commits. >> >> the regression test failed probably because the decoder wasn't >> compiled >> with SRI or IRST LM, which some of the regression test specify. I >> compiled your branch & it passes. >> >> I suppose for convenience, we should change it to use KenLM, with >> specific tests for IRST & SRI. >> >> On 25/07/2011 21:51, Marc LEGENDRE wrote: >>> Well, I actually commited in the augmLMResult branch. >>> >>> I inserted a class between LMKen and LMSingleFactor to prevent the >>> inclusion of kenlm headers. >>> (And yes, I now realize this may be the kind of things you write in >>> a commit message) >>> >>> Since the LanguageModelKen.h header now contains functions I want >>> to use, >>> can we add it to the list of the installed files ? (&& How ? ) >>> >>> >>> Also, I can't get the regression tests to work. >>> I downloaded the test data&& extracted those in /tmp; I read what >>> I found, and this is the command I came up with >>> ./regression-testing/run-test-suite.pl >>> --decoder-phrase=moses-cmd/src/moses >>> --decoder-chart=moses-chart-cmd/src/moses_chart >>> But every test ends with a "MOSES CRASHED" message. (And the same >>> thing happens with the trunk build) >>> I tried to understand, and I noticed that .ini files for the tests >>> contain : >>> [lmodel-file] >>> 0 0 3 moses-reg-test-data-5/lm/europarl.en.srilm.gz >>> >>> Is that OK for kenlm ? >>> >>> Marc >>> >>> ----- Mail original ----- >>>> De: "Kenneth Heafield"<[email protected]> >>>> À: "Marc LEGENDRE"<[email protected]> >>>> Cc:[email protected],[email protected] >>>> Envoyé: Vendredi 22 Juillet 2011 20:18:21 >>>> Objet: Re: [Moses-support] Using Moses language models >>>> >>>> Hi Marc, >>>> >>>> This sounds like a simple change, so a branch is probably too >>>> much >>>> overhead. Please do one of the following: >>>> >>>> 1. Send a patch as generated by diff -rupN $old $new . Do a make >>>> clean >>>> first. >>>> 2. Attach the files you modified and send them, along with the >>>> revision >>>> you based changes on. >>>> 3. Make a branch (if you already did). >>>> >>>> Thanks, >>>> >>>> Kenneth >>>> >>>> On 07/22/11 04:21, Marc LEGENDRE wrote: >>>>> Well, we (me and the people I work with) were hoping not to have >>>>> to >>>>> maintain >>>>> a modified version of Moses. >>>>> >>>>> Luckily, obviousness just hit me like a truck : if something is >>>>> specific to a LM, >>>>> it does not have to be in the top layer. >>>>> Having a common interface does not prevent subclasses from having >>>>> a >>>>> specific behaviour, >>>>> we could have a LanguageModelKen method, say >>>>> GetValueForgotStateKen(...) which would return >>>>> something specific, say a LMKenResult, which would contain a >>>>> LMResult plus others things >>>>> like, say, a ngram_length field :-). >>>>> And the virtual GetValueForgotState() method would simply return >>>>> the LMResult from there. >>>>> >>>>> This way, no need to break the high level API, >>>>> and no extra maintenance cost for us (me and the peop... Well, >>>>> you >>>>> know). >>>>> >>>>> ----- Mail original ----- >>>>>> De: "Hieu Hoang"<[email protected]> >>>>>> À: "Kenneth Heafield"<[email protected]> >>>>>> Cc:[email protected] >>>>>> Envoyé: Vendredi 22 Juillet 2011 04:50:14 >>>>>> Objet: Re: [Moses-support] Using Moses language models >>>>>> >>>>>> >>>>>> true,& there's no right answer to it. >>>>>> >>>>>> I suppose 1 goal of the trunk is to make sure that the core >>>>>> functionality of translating isn't affected too much, in terms >>>>>> of >>>>>> quality, speed, or memory. ANother goal is to make not to >>>>>> overburden >>>>>> the API with things no-one else uses or implement. >>>>>> >>>>>> therefore, i think a good strategy is to branch& do what you >>>>>> like >>>>>> >>>>>> >>>>>> On 21 July 2011 22:46, Kenneth Heafield< [email protected] > >>>>>> wrote: >>>>>> >>>>>> >>>>>> Marc makes a good point. When one language model provides more >>>>>> information than do other language models, it's difficult to >>>>>> maintain >>>>>> a >>>>>> common abstraction layer. Currently we're looking at n-gram >>>>>> length. >>>>>> SRILM doesn't provide access to that (but you can get >>>>>> right-looking >>>>>> state length which is usually the same thing). >>>>>> >>>>>> I'm working on making this issue more severe with left-looking >>>>>> state >>>>>> optimization and explicit hypothesis bounds. How do we change >>>>>> the >>>>>> decoder to use these features if not all of the language models >>>>>> support >>>>>> them? >>>>>> >>>>>> Maybe another class in the language model hierarchy supporting >>>>>> these >>>>>> additional features. But it's going to make the decoder look >>>>>> ugly >>>>>> if >>>>>> you want to support both. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On 07/21/11 11:14, Hieu Hoang wrote: >>>>>>> hi marc, >>>>>>> >>>>>>> it'll be good for people to see your changes. >>>>>>> >>>>>>> i suppose you should create a branch and make your changes in >>>>>>> there. >>>>>>> >>>>>>> If there are other people interested, you can point them to >>>>>>> your >>>>>>> branch. >>>>>>> If more people are interested and it doesn't affect other >>>>>>> people >>>>>>> too >>>>>>> much, then we can move it to trunk. >>>>>>> >>>>>>> i'll email you offline with svn details >>>>>>> >>>>>>> On 21/07/2011 15:16, Marc LEGENDRE wrote: >>>>>>>> Alright, I gave this a try, and it did it for me. >>>>>>>> With kenlm, it is a ridiculously straightforward modification, >>>>>>>> but now I'm not sure how I can submit it : >>>>>>>> on one hand, I am not a "machine tranlation guy" and I don't >>>>>>>> imagine myself >>>>>>>> digging in every other LM to find how to set the ngram_length >>>>>>>> value; >>>>>>>> and on the other hand I would feel guilty to submit a 10-line >>>>>>>> patch and say >>>>>>>> "Guys, I need this, would you mind committing it and doing >>>>>>>> yourselves the >>>>>>>> necessary modifications in every other wrapper ?" >>>>>>>> >>>>>>>> How do you, Moses developers, feel about this ? >>>>>>>> Is it acceptable / outrageously stupid if I set the value to >>>>>>>> -1 >>>>>>>> in >>>>>>>> the other wrappers, >>>>>>>> maybe with a TODO, and properly document it in the super class >>>>>>>> ? >>>>>>>> >>>>>>>> ----- Mail original ----- >>>>>>>>> De: "Kenneth Heafield"< [email protected] > >>>>>>>>> À:[email protected] >>>>>>>>> Envoyé: Mercredi 13 Juillet 2011 20:53:46 >>>>>>>>> Objet: Re: [Moses-support] Using Moses language models >>>>>>>>> >>>>>>>>> I'd suggest adding a ngram_length member to LMResult then >>>>>>>>> modifying >>>>>>>>> each >>>>>>>>> model's wrapper (or just mine) to set that value. >>>>>>>>> >>>>>>>>> You're welcome to move stuff from LanguageModelKen.cpp to >>>>>>>>> LanguageModelKen.h as necessary. I chose this setup to >>>>>>>>> minimize >>>>>>>>> unnecessary includes. >>>>>>>>> >>>>>>>>> Kenneth >>>>>>>>> >>>>>>>>> On 07/13/11 14:33, Marc LEGENDRE wrote: >>>>>>>>>> Well, not only the header is not "public", so to speak, >>>>>>>>>> (which >>>>>>>>>> I >>>>>>>>>> agree is not a major obstacle) >>>>>>>>>> but also the desired pointer is a private member of the >>>>>>>>>> class, >>>>>>>>>> and >>>>>>>>>> sadly lacks a getter. >>>>>>>>>> As far as I know, it means that accessing it will involve >>>>>>>>>> questionnable C++ tricks. >>>>>>>>>> (never tried, though) >>>>>>>>>> >>>>>>>>>> If modifying Moses is not too much of a chore, I'll give it >>>>>>>>>> a >>>>>>>>>> thought. >>>>>>>>>> >>>>>>>>>> Anyway, thank you for your answers. >>>>>>>>>> >>>>>>>>>> ----- Mail original ----- >>>>>>>>>>> De: "Hieu Hoang"< [email protected] > >>>>>>>>>>> À:[email protected] >>>>>>>>>>> Envoyé: Mercredi 13 Juillet 2011 18:40:11 >>>>>>>>>>> Objet: Re: [Moses-support] Using Moses language models >>>>>>>>>>> i guess lm::Model is specific to the ken lm implementation. >>>>>>>>>>> If >>>>>>>>>>> you >>>>>>>>>>> want >>>>>>>>>>> use it you should include the header yourself and cast >>>>>>>>>>> whatever >>>>>>>>>>> you >>>>>>>>>>> need >>>>>>>>>>> to get the pointer. >>>>>>>>>>> >>>>>>>>>>> if you're feeling generous, maybe you can extend the moses >>>>>>>>>>> LM >>>>>>>>>>> wrapper >>>>>>>>>>> so >>>>>>>>>>> that all LM implementations have the opportunity to return >>>>>>>>>>> the >>>>>>>>>>> length >>>>>>>>>>> n-gram match. >>>>>>>>>>> >>>>>>>>>>> On 13/07/2011 21:51, Marc LEGENDRE wrote: >>>>>>>>>>>> The length of the n-gram match is sufficient for I want, >>>>>>>>>>>> indeed. >>>>>>>>>>>> I figured out how to do get it using directly kenlm, but >>>>>>>>>>>> as >>>>>>>>>>>> I >>>>>>>>>>>> am >>>>>>>>>>>> running the decoder, I wanted to use the already loaded >>>>>>>>>>>> LM. >>>>>>>>>>>> >>>>>>>>>>>> I first tried to dig my way through the Moses abstraction >>>>>>>>>>>> layers >>>>>>>>>>>> to >>>>>>>>>>>> retrieve a pointer to a lm::Model from kenlm, but the >>>>>>>>>>>> Moses::LanguageModelKen header is not part of the public >>>>>>>>>>>> headers >>>>>>>>>>>> of >>>>>>>>>>>> Moses ; that's why I tried to use only Moses interface. >>>>>>>>>>>> >>>>>>>>>>>> (I did I did not mention this alternative ; If someone >>>>>>>>>>>> knows >>>>>>>>>>>> how >>>>>>>>>>>> to >>>>>>>>>>>> get such a pointer, I can carry on from there) >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> ----- Mail original ----- >>>>>>>>>>>>> De: "Kenneth Heafield"< [email protected] > >>>>>>>>>>>>> À: "Marc LEGENDRE"< [email protected] >>>>>>>>>>>>> > >>>>>>>>>>>>> Envoyé: Mercredi 13 Juillet 2011 16:12:27 >>>>>>>>>>>>> Objet: Re: [Moses-support] Using Moses language models >>>>>>>>>>>>> The definition of unknown is that the word you asked for >>>>>>>>>>>>> (the >>>>>>>>>>>>> rightmost >>>>>>>>>>>>> one) is mapped to<unk> i.e. an OOV. >>>>>>>>>>>>> >>>>>>>>>>>>> Are you looking for: >>>>>>>>>>>>> >>>>>>>>>>>>> 1) Length of n-gram matched in the model >>>>>>>>>>>>> >>>>>>>>>>>>> or >>>>>>>>>>>>> >>>>>>>>>>>>> 2) Length of state you must keep for valid continuation >>>>>>>>>>>>> to >>>>>>>>>>>>> the >>>>>>>>>>>>> right >>>>>>>>>>>>> >>>>>>>>>>>>> These are slightly different things due to state >>>>>>>>>>>>> minimization. >>>>>>>>>>>>> The >>>>>>>>>>>>> moses abstraction layer does not return either in a >>>>>>>>>>>>> general >>>>>>>>>>>>> way. >>>>>>>>>>>>> However, if you're using KenLM, #2 is in the returned >>>>>>>>>>>>> state's >>>>>>>>>>>>> valid_length_. Further, #1 is in >>>>>>>>>>>>> FullScoreReturn.ngram_length. >>>>>>>>>>>>> So >>>>>>>>>>>>> if >>>>>>>>>>>>> you call KenLM directly these are easy to obtain (and you >>>>>>>>>>>>> can >>>>>>>>>>>>> decide >>>>>>>>>>>>> whether to expose them through the Moses abstraction >>>>>>>>>>>>> layer). >>>>>>>>>>>>> >>>>>>>>>>>>> Outside the decoder, you can run >>>>>>>>>>>>> >>>>>>>>>>>>> kenlm/query model_file null >>>>>>>>>>>>> >>>>>>>>>>>>> then provide your trigrams on stdin. >>>>>>>>>>>>> >>>>>>>>>>>>> Here's an example with kenlm/query kenlm/lm/test.arpa >>>>>>>>>>>>> null >>>>>>>>>>>>> >>>>>>>>>>>>> looking on a >>>>>>>>>>>>> looking=23 1 -1.28594 on=25 2 -0.46389 a=5 3 -0.0483513 >>>>>>>>>>>>> Total: -1.79818 OOV: 0 >>>>>>>>>>>>> >>>>>>>>>>>>> The format is "word=vocab_id ngram_length score". So this >>>>>>>>>>>>> is >>>>>>>>>>>>> a >>>>>>>>>>>>> trigram >>>>>>>>>>>>> in the model because "a=5 3" appears. >>>>>>>>>>>>> >>>>>>>>>>>>> On 07/13/11 08:50, Marc LEGENDRE wrote: >>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>> >>>>>>>>>>>>>> I am trying to use the language models loaded by Moses ; >>>>>>>>>>>>>> >>>>>>>>>>>>>> I am using a 3-gram LM, and I need to know whether it >>>>>>>>>>>>>> contains >>>>>>>>>>>>>> a >>>>>>>>>>>>>> given N-gram or not. >>>>>>>>>>>>>> I tried to play around with >>>>>>>>>>>>>> LanguageModelImplementation::GetValueForgotState(...), >>>>>>>>>>>>>> but the boolean 'unknown' in the returned structure does >>>>>>>>>>>>>> not >>>>>>>>>>>>>> seem >>>>>>>>>>>>>> to >>>>>>>>>>>>>> be what I'm looking for. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Is there any simple way of getting this piece of >>>>>>>>>>>>>> information >>>>>>>>>>>>>> ? >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>> Marc Legendre >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>> Moses-support mailing list >>>>>>>>>>>>>> [email protected] >>>>>>>>>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> Moses-support mailing list >>>>>>>>>>>> [email protected] >>>>>>>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> Moses-support mailing list >>>>>>>>>>> [email protected] >>>>>>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support >>>>>>>>>> _______________________________________________ >>>>>>>>>> Moses-support mailing list >>>>>>>>>> [email protected] >>>>>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support >>>>>>>>> _______________________________________________ >>>>>>>>> Moses-support mailing list >>>>>>>>> [email protected] >>>>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support >>>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Moses-support mailing list >>>>>>>> [email protected] >>>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support >>>>>>>> >>>>>>>> >>>>>>> _______________________________________________ >>>>>>> Moses-support mailing list >>>>>>> [email protected] >>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support >>>>>> _______________________________________________ >>>>>> Moses-support mailing list >>>>>> [email protected] >>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Moses-support mailing list >>>>>> [email protected] >>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support >>>>>> > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
