Re: [Moses-support] Using Moses language models

Kenneth Heafield Wed, 24 Aug 2011 02:54:46 -0700

I support depending on Boost but sadly some people don't. 
PhraseDictionaryTree.cpp:3 in your branch includes a boost header.


Kenneth

On 08/24/11 10:17, Marc LEGENDRE wrote:
> Hi,
>
> I merged the trunk into my branch; it looks ok.
> May my little modification to LMKen.h/cpp be finally merged into the trunk ?
> (not the useless changes to PhraseDictionaryTree)
>
> Thanks, (And sorry for my low reactivity, I hope you remember me!)
>
> Marc
>
> ----- Mail original -----
>> De: "Hieu Hoang" <[email protected]>
>> À: "Marc LEGENDRE" <[email protected]>
>> Cc: "Kenneth Heafield" <[email protected]>, [email protected]
>> Envoyé: Mercredi 27 Juillet 2011 13:34:35
>> Objet: Re: [Moses-support] Using Moses language models
>>
>> hi marc,
>>
>> thx for the commits.
>>
>> the regression test failed probably because the decoder wasn't
>> compiled
>> with SRI or IRST LM, which some of the regression test specify. I
>> compiled your branch & it passes.
>>
>> I suppose for convenience, we should change it to use KenLM, with
>> specific tests for IRST & SRI.
>>
>> On 25/07/2011 21:51, Marc LEGENDRE wrote:
>>> Well, I actually commited in the augmLMResult branch.
>>>
>>> I inserted a class between LMKen and LMSingleFactor to prevent the
>>> inclusion of kenlm headers.
>>> (And yes, I now realize this may be the kind of things you write in
>>> a commit message)
>>>
>>> Since the LanguageModelKen.h header now contains functions I want
>>> to use,
>>> can we add it to the list of the installed files ? (&&  How ? )
>>>
>>>
>>> Also, I can't get the regression tests to work.
>>> I downloaded the test data&&  extracted those in /tmp; I read what
>>> I found, and this is the command I came up with
>>> ./regression-testing/run-test-suite.pl
>>> --decoder-phrase=moses-cmd/src/moses
>>> --decoder-chart=moses-chart-cmd/src/moses_chart
>>> But every test ends with a "MOSES CRASHED" message. (And the same
>>> thing happens with the trunk build)
>>> I tried to understand, and I noticed that .ini files for the tests
>>> contain :
>>> [lmodel-file]
>>> 0 0 3 moses-reg-test-data-5/lm/europarl.en.srilm.gz
>>>
>>> Is that OK for kenlm ?
>>>
>>> Marc
>>>
>>> ----- Mail original -----
>>>> De: "Kenneth Heafield"<[email protected]>
>>>> À: "Marc LEGENDRE"<[email protected]>
>>>> Cc:[email protected],[email protected]
>>>> Envoyé: Vendredi 22 Juillet 2011 20:18:21
>>>> Objet: Re: [Moses-support] Using Moses language models
>>>>
>>>> Hi Marc,
>>>>
>>>>    This sounds like a simple change, so a branch is probably too
>>>>    much
>>>> overhead.  Please do one of the following:
>>>>
>>>> 1. Send a patch as generated by diff -rupN $old $new .  Do a make
>>>> clean
>>>> first.
>>>> 2. Attach the files you modified and send them, along with the
>>>> revision
>>>> you based changes on.
>>>> 3. Make a branch (if you already did).
>>>>
>>>> Thanks,
>>>>
>>>> Kenneth
>>>>
>>>> On 07/22/11 04:21, Marc LEGENDRE wrote:
>>>>> Well, we (me and the people I work with) were hoping not to have
>>>>> to
>>>>> maintain
>>>>> a modified version of Moses.
>>>>>
>>>>> Luckily, obviousness just hit me like a truck : if something is
>>>>> specific to a LM,
>>>>> it does not have to be in the top layer.
>>>>> Having a common interface does not prevent subclasses from having
>>>>> a
>>>>> specific behaviour,
>>>>> we could have a LanguageModelKen method, say
>>>>> GetValueForgotStateKen(...) which would return
>>>>> something specific, say a LMKenResult, which would contain a
>>>>> LMResult plus others things
>>>>> like, say, a ngram_length field :-).
>>>>> And the virtual GetValueForgotState() method would simply return
>>>>> the LMResult from there.
>>>>>
>>>>> This way, no need to break the high level API,
>>>>> and no extra maintenance cost for us (me and the peop... Well,
>>>>> you
>>>>> know).
>>>>>
>>>>> ----- Mail original -----
>>>>>> De: "Hieu Hoang"<[email protected]>
>>>>>> À: "Kenneth Heafield"<[email protected]>
>>>>>> Cc:[email protected]
>>>>>> Envoyé: Vendredi 22 Juillet 2011 04:50:14
>>>>>> Objet: Re: [Moses-support] Using Moses language models
>>>>>>
>>>>>>
>>>>>> true,&  there's no right answer to it.
>>>>>>
>>>>>> I suppose 1 goal of the trunk is to make sure that the core
>>>>>> functionality of translating isn't affected too much, in terms
>>>>>> of
>>>>>> quality, speed, or memory. ANother goal is to make not to
>>>>>> overburden
>>>>>> the API with things no-one else uses or implement.
>>>>>>
>>>>>> therefore, i think a good strategy is to branch&  do what you
>>>>>> like
>>>>>>
>>>>>>
>>>>>> On 21 July 2011 22:46, Kenneth Heafield<  [email protected]  >
>>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> Marc makes a good point. When one language model provides more
>>>>>> information than do other language models, it's difficult to
>>>>>> maintain
>>>>>> a
>>>>>> common abstraction layer. Currently we're looking at n-gram
>>>>>> length.
>>>>>> SRILM doesn't provide access to that (but you can get
>>>>>> right-looking
>>>>>> state length which is usually the same thing).
>>>>>>
>>>>>> I'm working on making this issue more severe with left-looking
>>>>>> state
>>>>>> optimization and explicit hypothesis bounds. How do we change
>>>>>> the
>>>>>> decoder to use these features if not all of the language models
>>>>>> support
>>>>>> them?
>>>>>>
>>>>>> Maybe another class in the language model hierarchy supporting
>>>>>> these
>>>>>> additional features. But it's going to make the decoder look
>>>>>> ugly
>>>>>> if
>>>>>> you want to support both.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 07/21/11 11:14, Hieu Hoang wrote:
>>>>>>> hi marc,
>>>>>>>
>>>>>>> it'll be good for people to see your changes.
>>>>>>>
>>>>>>> i suppose you should create a branch and make your changes in
>>>>>>> there.
>>>>>>>
>>>>>>> If there are other people interested, you can point them to
>>>>>>> your
>>>>>>> branch.
>>>>>>> If more people are interested and it doesn't affect other
>>>>>>> people
>>>>>>> too
>>>>>>> much, then we can move it to trunk.
>>>>>>>
>>>>>>> i'll email you offline with svn details
>>>>>>>
>>>>>>> On 21/07/2011 15:16, Marc LEGENDRE wrote:
>>>>>>>> Alright, I gave this a try, and it did it for me.
>>>>>>>> With kenlm, it is a ridiculously straightforward modification,
>>>>>>>> but now I'm not sure how I can submit it :
>>>>>>>> on one hand, I am not a "machine tranlation guy" and I don't
>>>>>>>> imagine myself
>>>>>>>> digging in every other LM to find how to set the ngram_length
>>>>>>>> value;
>>>>>>>> and on the other hand I would feel guilty to submit a 10-line
>>>>>>>> patch and say
>>>>>>>> "Guys, I need this, would you mind committing it and doing
>>>>>>>> yourselves the
>>>>>>>> necessary modifications in every other wrapper ?"
>>>>>>>>
>>>>>>>> How do you, Moses developers, feel about this ?
>>>>>>>> Is it acceptable / outrageously stupid if I set the value to
>>>>>>>> -1
>>>>>>>> in
>>>>>>>> the other wrappers,
>>>>>>>> maybe with a TODO, and properly document it in the super class
>>>>>>>> ?
>>>>>>>>
>>>>>>>> ----- Mail original -----
>>>>>>>>> De: "Kenneth Heafield"<  [email protected]  >
>>>>>>>>> À:[email protected]
>>>>>>>>> Envoyé: Mercredi 13 Juillet 2011 20:53:46
>>>>>>>>> Objet: Re: [Moses-support] Using Moses language models
>>>>>>>>>
>>>>>>>>> I'd suggest adding a ngram_length member to LMResult then
>>>>>>>>> modifying
>>>>>>>>> each
>>>>>>>>> model's wrapper (or just mine) to set that value.
>>>>>>>>>
>>>>>>>>> You're welcome to move stuff from LanguageModelKen.cpp to
>>>>>>>>> LanguageModelKen.h as necessary. I chose this setup to
>>>>>>>>> minimize
>>>>>>>>> unnecessary includes.
>>>>>>>>>
>>>>>>>>> Kenneth
>>>>>>>>>
>>>>>>>>> On 07/13/11 14:33, Marc LEGENDRE wrote:
>>>>>>>>>> Well, not only the header is not "public", so to speak,
>>>>>>>>>> (which
>>>>>>>>>> I
>>>>>>>>>> agree is not a major obstacle)
>>>>>>>>>> but also the desired pointer is a private member of the
>>>>>>>>>> class,
>>>>>>>>>> and
>>>>>>>>>> sadly lacks a getter.
>>>>>>>>>> As far as I know, it means that accessing it will involve
>>>>>>>>>> questionnable C++ tricks.
>>>>>>>>>> (never tried, though)
>>>>>>>>>>
>>>>>>>>>> If modifying Moses is not too much of a chore, I'll give it
>>>>>>>>>> a
>>>>>>>>>> thought.
>>>>>>>>>>
>>>>>>>>>> Anyway, thank you for your answers.
>>>>>>>>>>
>>>>>>>>>> ----- Mail original -----
>>>>>>>>>>> De: "Hieu Hoang"<  [email protected]  >
>>>>>>>>>>> À:[email protected]
>>>>>>>>>>> Envoyé: Mercredi 13 Juillet 2011 18:40:11
>>>>>>>>>>> Objet: Re: [Moses-support] Using Moses language models
>>>>>>>>>>> i guess lm::Model is specific to the ken lm implementation.
>>>>>>>>>>> If
>>>>>>>>>>> you
>>>>>>>>>>> want
>>>>>>>>>>> use it you should include the header yourself and cast
>>>>>>>>>>> whatever
>>>>>>>>>>> you
>>>>>>>>>>> need
>>>>>>>>>>> to get the pointer.
>>>>>>>>>>>
>>>>>>>>>>> if you're feeling generous, maybe you can extend the moses
>>>>>>>>>>> LM
>>>>>>>>>>> wrapper
>>>>>>>>>>> so
>>>>>>>>>>> that all LM implementations have the opportunity to return
>>>>>>>>>>> the
>>>>>>>>>>> length
>>>>>>>>>>> n-gram match.
>>>>>>>>>>>
>>>>>>>>>>> On 13/07/2011 21:51, Marc LEGENDRE wrote:
>>>>>>>>>>>> The length of the n-gram match is sufficient for I want,
>>>>>>>>>>>> indeed.
>>>>>>>>>>>> I figured out how to do get it using directly kenlm, but
>>>>>>>>>>>> as
>>>>>>>>>>>> I
>>>>>>>>>>>> am
>>>>>>>>>>>> running the decoder, I wanted to use the already loaded
>>>>>>>>>>>> LM.
>>>>>>>>>>>>
>>>>>>>>>>>> I first tried to dig my way through the Moses abstraction
>>>>>>>>>>>> layers
>>>>>>>>>>>> to
>>>>>>>>>>>> retrieve a pointer to a lm::Model from kenlm, but the
>>>>>>>>>>>> Moses::LanguageModelKen header is not part of the public
>>>>>>>>>>>> headers
>>>>>>>>>>>> of
>>>>>>>>>>>> Moses ; that's why I tried to use only Moses interface.
>>>>>>>>>>>>
>>>>>>>>>>>> (I did I did not mention this alternative ; If someone
>>>>>>>>>>>> knows
>>>>>>>>>>>> how
>>>>>>>>>>>> to
>>>>>>>>>>>> get such a pointer, I can carry on from there)
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> ----- Mail original -----
>>>>>>>>>>>>> De: "Kenneth Heafield"<  [email protected]  >
>>>>>>>>>>>>> À: "Marc LEGENDRE"<  [email protected]
>>>>>>>>>>>>>  >
>>>>>>>>>>>>> Envoyé: Mercredi 13 Juillet 2011 16:12:27
>>>>>>>>>>>>> Objet: Re: [Moses-support] Using Moses language models
>>>>>>>>>>>>> The definition of unknown is that the word you asked for
>>>>>>>>>>>>> (the
>>>>>>>>>>>>> rightmost
>>>>>>>>>>>>> one) is mapped to<unk>  i.e. an OOV.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Are you looking for:
>>>>>>>>>>>>>
>>>>>>>>>>>>> 1) Length of n-gram matched in the model
>>>>>>>>>>>>>
>>>>>>>>>>>>> or
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2) Length of state you must keep for valid continuation
>>>>>>>>>>>>> to
>>>>>>>>>>>>> the
>>>>>>>>>>>>> right
>>>>>>>>>>>>>
>>>>>>>>>>>>> These are slightly different things due to state
>>>>>>>>>>>>> minimization.
>>>>>>>>>>>>> The
>>>>>>>>>>>>> moses abstraction layer does not return either in a
>>>>>>>>>>>>> general
>>>>>>>>>>>>> way.
>>>>>>>>>>>>> However, if you're using KenLM, #2 is in the returned
>>>>>>>>>>>>> state's
>>>>>>>>>>>>> valid_length_. Further, #1 is in
>>>>>>>>>>>>> FullScoreReturn.ngram_length.
>>>>>>>>>>>>> So
>>>>>>>>>>>>> if
>>>>>>>>>>>>> you call KenLM directly these are easy to obtain (and you
>>>>>>>>>>>>> can
>>>>>>>>>>>>> decide
>>>>>>>>>>>>> whether to expose them through the Moses abstraction
>>>>>>>>>>>>> layer).
>>>>>>>>>>>>>
>>>>>>>>>>>>> Outside the decoder, you can run
>>>>>>>>>>>>>
>>>>>>>>>>>>> kenlm/query model_file null
>>>>>>>>>>>>>
>>>>>>>>>>>>> then provide your trigrams on stdin.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Here's an example with kenlm/query kenlm/lm/test.arpa
>>>>>>>>>>>>> null
>>>>>>>>>>>>>
>>>>>>>>>>>>> looking on a
>>>>>>>>>>>>> looking=23 1 -1.28594 on=25 2 -0.46389 a=5 3 -0.0483513
>>>>>>>>>>>>> Total: -1.79818 OOV: 0
>>>>>>>>>>>>>
>>>>>>>>>>>>> The format is "word=vocab_id ngram_length score". So this
>>>>>>>>>>>>> is
>>>>>>>>>>>>> a
>>>>>>>>>>>>> trigram
>>>>>>>>>>>>> in the model because "a=5 3" appears.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 07/13/11 08:50, Marc LEGENDRE wrote:
>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I am trying to use the language models loaded by Moses ;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I am using a 3-gram LM, and I need to know whether it
>>>>>>>>>>>>>> contains
>>>>>>>>>>>>>> a
>>>>>>>>>>>>>> given N-gram or not.
>>>>>>>>>>>>>> I tried to play around with
>>>>>>>>>>>>>> LanguageModelImplementation::GetValueForgotState(...),
>>>>>>>>>>>>>> but the boolean 'unknown' in the returned structure does
>>>>>>>>>>>>>> not
>>>>>>>>>>>>>> seem
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>> be what I'm looking for.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Is there any simple way of getting this piece of
>>>>>>>>>>>>>> information
>>>>>>>>>>>>>> ?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>> Marc Legendre
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>> Moses-support mailing list
>>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> Moses-support mailing list
>>>>>>>>>>>> [email protected]
>>>>>>>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> Moses-support mailing list
>>>>>>>>>>> [email protected]
>>>>>>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Moses-support mailing list
>>>>>>>>>> [email protected]
>>>>>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>>>>>> _______________________________________________
>>>>>>>>> Moses-support mailing list
>>>>>>>>> [email protected]
>>>>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Moses-support mailing list
>>>>>>>> [email protected]
>>>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>>>>>
>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Moses-support mailing list
>>>>>>> [email protected]
>>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>>> _______________________________________________
>>>>>> Moses-support mailing list
>>>>>> [email protected]
>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Moses-support mailing list
>>>>>> [email protected]
>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>>>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Using Moses language models

Reply via email to