Re: [Moses-support] Using Moses language models

Kenneth Heafield Wed, 24 Aug 2011 03:38:27 -0700

Sorry about the spam.  Should have remembered you said to ignore
PhraseDictionaryTree.  FWIW, you can use std::auto_ptr from #include
<memory> but that's set to be deprecated with C++0x.


Merged your memory leak fix in a slightly different way.  Also, since
I'm merging part of branch, do you mind if it says my name on the change
but the commentary says you?  Or you can teach me more svn. . .

Kenneth

On 08/24/11 11:19, Marc LEGENDRE wrote:
> Yes I understood this from another discussion.
> The point in PhraseDictionaryTree.cpp was just memory management.
> (admitedly, to silence Valgrind ; but hey, don't we all strive for perfection 
> ? :-)
>
> I don't need this, I guess I should have removed it from my branch if I 
> wanted to merge.
> It's done.
>
> ----- Mail original -----
>> De: "Kenneth Heafield" <[email protected]>
>> À: [email protected]
>> Envoyé: Mercredi 24 Août 2011 11:52:19
>> Objet: Re: [Moses-support] Using Moses language models
>>
>> I support depending on Boost but sadly some people don't.
>> PhraseDictionaryTree.cpp:3 in your branch includes a boost header.
>>
>> Kenneth
>>
>> On 08/24/11 10:17, Marc LEGENDRE wrote:
>>> Hi,
>>>
>>> I merged the trunk into my branch; it looks ok.
>>> May my little modification to LMKen.h/cpp be finally merged into
>>> the trunk ?
>>> (not the useless changes to PhraseDictionaryTree)
>>>
>>> Thanks, (And sorry for my low reactivity, I hope you remember me!)
>>>
>>> Marc
>>>
>>> ----- Mail original -----
>>>> De: "Hieu Hoang" <[email protected]>
>>>> À: "Marc LEGENDRE" <[email protected]>
>>>> Cc: "Kenneth Heafield" <[email protected]>,
>>>> [email protected]
>>>> Envoyé: Mercredi 27 Juillet 2011 13:34:35
>>>> Objet: Re: [Moses-support] Using Moses language models
>>>>
>>>> hi marc,
>>>>
>>>> thx for the commits.
>>>>
>>>> the regression test failed probably because the decoder wasn't
>>>> compiled
>>>> with SRI or IRST LM, which some of the regression test specify. I
>>>> compiled your branch & it passes.
>>>>
>>>> I suppose for convenience, we should change it to use KenLM, with
>>>> specific tests for IRST & SRI.
>>>>
>>>> On 25/07/2011 21:51, Marc LEGENDRE wrote:
>>>>> Well, I actually commited in the augmLMResult branch.
>>>>>
>>>>> I inserted a class between LMKen and LMSingleFactor to prevent
>>>>> the
>>>>> inclusion of kenlm headers.
>>>>> (And yes, I now realize this may be the kind of things you write
>>>>> in
>>>>> a commit message)
>>>>>
>>>>> Since the LanguageModelKen.h header now contains functions I want
>>>>> to use,
>>>>> can we add it to the list of the installed files ? (&&  How ? )
>>>>>
>>>>>
>>>>> Also, I can't get the regression tests to work.
>>>>> I downloaded the test data&&  extracted those in /tmp; I read
>>>>> what
>>>>> I found, and this is the command I came up with
>>>>> ./regression-testing/run-test-suite.pl
>>>>> --decoder-phrase=moses-cmd/src/moses
>>>>> --decoder-chart=moses-chart-cmd/src/moses_chart
>>>>> But every test ends with a "MOSES CRASHED" message. (And the same
>>>>> thing happens with the trunk build)
>>>>> I tried to understand, and I noticed that .ini files for the
>>>>> tests
>>>>> contain :
>>>>> [lmodel-file]
>>>>> 0 0 3 moses-reg-test-data-5/lm/europarl.en.srilm.gz
>>>>>
>>>>> Is that OK for kenlm ?
>>>>>
>>>>> Marc
>>>>>
>>>>> ----- Mail original -----
>>>>>> De: "Kenneth Heafield"<[email protected]>
>>>>>> À: "Marc LEGENDRE"<[email protected]>
>>>>>> Cc:[email protected],[email protected]
>>>>>> Envoyé: Vendredi 22 Juillet 2011 20:18:21
>>>>>> Objet: Re: [Moses-support] Using Moses language models
>>>>>>
>>>>>> Hi Marc,
>>>>>>
>>>>>>         This sounds like a simple change, so a branch is probably too
>>>>>>         much
>>>>>> overhead.  Please do one of the following:
>>>>>>
>>>>>> 1. Send a patch as generated by diff -rupN $old $new .  Do a
>>>>>> make
>>>>>> clean
>>>>>> first.
>>>>>> 2. Attach the files you modified and send them, along with the
>>>>>> revision
>>>>>> you based changes on.
>>>>>> 3. Make a branch (if you already did).
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Kenneth
>>>>>>
>>>>>> On 07/22/11 04:21, Marc LEGENDRE wrote:
>>>>>>> Well, we (me and the people I work with) were hoping not to
>>>>>>> have
>>>>>>> to
>>>>>>> maintain
>>>>>>> a modified version of Moses.
>>>>>>>
>>>>>>> Luckily, obviousness just hit me like a truck : if something is
>>>>>>> specific to a LM,
>>>>>>> it does not have to be in the top layer.
>>>>>>> Having a common interface does not prevent subclasses from
>>>>>>> having
>>>>>>> a
>>>>>>> specific behaviour,
>>>>>>> we could have a LanguageModelKen method, say
>>>>>>> GetValueForgotStateKen(...) which would return
>>>>>>> something specific, say a LMKenResult, which would contain a
>>>>>>> LMResult plus others things
>>>>>>> like, say, a ngram_length field :-).
>>>>>>> And the virtual GetValueForgotState() method would simply
>>>>>>> return
>>>>>>> the LMResult from there.
>>>>>>>
>>>>>>> This way, no need to break the high level API,
>>>>>>> and no extra maintenance cost for us (me and the peop... Well,
>>>>>>> you
>>>>>>> know).
>>>>>>>
>>>>>>> ----- Mail original -----
>>>>>>>> De: "Hieu Hoang"<[email protected]>
>>>>>>>> À: "Kenneth Heafield"<[email protected]>
>>>>>>>> Cc:[email protected]
>>>>>>>> Envoyé: Vendredi 22 Juillet 2011 04:50:14
>>>>>>>> Objet: Re: [Moses-support] Using Moses language models
>>>>>>>>
>>>>>>>>
>>>>>>>> true,&  there's no right answer to it.
>>>>>>>>
>>>>>>>> I suppose 1 goal of the trunk is to make sure that the core
>>>>>>>> functionality of translating isn't affected too much, in terms
>>>>>>>> of
>>>>>>>> quality, speed, or memory. ANother goal is to make not to
>>>>>>>> overburden
>>>>>>>> the API with things no-one else uses or implement.
>>>>>>>>
>>>>>>>> therefore, i think a good strategy is to branch&  do what you
>>>>>>>> like
>>>>>>>>
>>>>>>>>
>>>>>>>> On 21 July 2011 22:46, Kenneth Heafield<  [email protected]
>>>>>>>>  >
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> Marc makes a good point. When one language model provides more
>>>>>>>> information than do other language models, it's difficult to
>>>>>>>> maintain
>>>>>>>> a
>>>>>>>> common abstraction layer. Currently we're looking at n-gram
>>>>>>>> length.
>>>>>>>> SRILM doesn't provide access to that (but you can get
>>>>>>>> right-looking
>>>>>>>> state length which is usually the same thing).
>>>>>>>>
>>>>>>>> I'm working on making this issue more severe with left-looking
>>>>>>>> state
>>>>>>>> optimization and explicit hypothesis bounds. How do we change
>>>>>>>> the
>>>>>>>> decoder to use these features if not all of the language
>>>>>>>> models
>>>>>>>> support
>>>>>>>> them?
>>>>>>>>
>>>>>>>> Maybe another class in the language model hierarchy supporting
>>>>>>>> these
>>>>>>>> additional features. But it's going to make the decoder look
>>>>>>>> ugly
>>>>>>>> if
>>>>>>>> you want to support both.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 07/21/11 11:14, Hieu Hoang wrote:
>>>>>>>>> hi marc,
>>>>>>>>>
>>>>>>>>> it'll be good for people to see your changes.
>>>>>>>>>
>>>>>>>>> i suppose you should create a branch and make your changes in
>>>>>>>>> there.
>>>>>>>>>
>>>>>>>>> If there are other people interested, you can point them to
>>>>>>>>> your
>>>>>>>>> branch.
>>>>>>>>> If more people are interested and it doesn't affect other
>>>>>>>>> people
>>>>>>>>> too
>>>>>>>>> much, then we can move it to trunk.
>>>>>>>>>
>>>>>>>>> i'll email you offline with svn details
>>>>>>>>>
>>>>>>>>> On 21/07/2011 15:16, Marc LEGENDRE wrote:
>>>>>>>>>> Alright, I gave this a try, and it did it for me.
>>>>>>>>>> With kenlm, it is a ridiculously straightforward
>>>>>>>>>> modification,
>>>>>>>>>> but now I'm not sure how I can submit it :
>>>>>>>>>> on one hand, I am not a "machine tranlation guy" and I don't
>>>>>>>>>> imagine myself
>>>>>>>>>> digging in every other LM to find how to set the
>>>>>>>>>> ngram_length
>>>>>>>>>> value;
>>>>>>>>>> and on the other hand I would feel guilty to submit a
>>>>>>>>>> 10-line
>>>>>>>>>> patch and say
>>>>>>>>>> "Guys, I need this, would you mind committing it and doing
>>>>>>>>>> yourselves the
>>>>>>>>>> necessary modifications in every other wrapper ?"
>>>>>>>>>>
>>>>>>>>>> How do you, Moses developers, feel about this ?
>>>>>>>>>> Is it acceptable / outrageously stupid if I set the value to
>>>>>>>>>> -1
>>>>>>>>>> in
>>>>>>>>>> the other wrappers,
>>>>>>>>>> maybe with a TODO, and properly document it in the super
>>>>>>>>>> class
>>>>>>>>>> ?
>>>>>>>>>>
>>>>>>>>>> ----- Mail original -----
>>>>>>>>>>> De: "Kenneth Heafield"<  [email protected]  >
>>>>>>>>>>> À:[email protected]
>>>>>>>>>>> Envoyé: Mercredi 13 Juillet 2011 20:53:46
>>>>>>>>>>> Objet: Re: [Moses-support] Using Moses language models
>>>>>>>>>>>
>>>>>>>>>>> I'd suggest adding a ngram_length member to LMResult then
>>>>>>>>>>> modifying
>>>>>>>>>>> each
>>>>>>>>>>> model's wrapper (or just mine) to set that value.
>>>>>>>>>>>
>>>>>>>>>>> You're welcome to move stuff from LanguageModelKen.cpp to
>>>>>>>>>>> LanguageModelKen.h as necessary. I chose this setup to
>>>>>>>>>>> minimize
>>>>>>>>>>> unnecessary includes.
>>>>>>>>>>>
>>>>>>>>>>> Kenneth
>>>>>>>>>>>
>>>>>>>>>>> On 07/13/11 14:33, Marc LEGENDRE wrote:
>>>>>>>>>>>> Well, not only the header is not "public", so to speak,
>>>>>>>>>>>> (which
>>>>>>>>>>>> I
>>>>>>>>>>>> agree is not a major obstacle)
>>>>>>>>>>>> but also the desired pointer is a private member of the
>>>>>>>>>>>> class,
>>>>>>>>>>>> and
>>>>>>>>>>>> sadly lacks a getter.
>>>>>>>>>>>> As far as I know, it means that accessing it will involve
>>>>>>>>>>>> questionnable C++ tricks.
>>>>>>>>>>>> (never tried, though)
>>>>>>>>>>>>
>>>>>>>>>>>> If modifying Moses is not too much of a chore, I'll give
>>>>>>>>>>>> it
>>>>>>>>>>>> a
>>>>>>>>>>>> thought.
>>>>>>>>>>>>
>>>>>>>>>>>> Anyway, thank you for your answers.
>>>>>>>>>>>>
>>>>>>>>>>>> ----- Mail original -----
>>>>>>>>>>>>> De: "Hieu Hoang"<  [email protected]  >
>>>>>>>>>>>>> À:[email protected]
>>>>>>>>>>>>> Envoyé: Mercredi 13 Juillet 2011 18:40:11
>>>>>>>>>>>>> Objet: Re: [Moses-support] Using Moses language models
>>>>>>>>>>>>> i guess lm::Model is specific to the ken lm
>>>>>>>>>>>>> implementation.
>>>>>>>>>>>>> If
>>>>>>>>>>>>> you
>>>>>>>>>>>>> want
>>>>>>>>>>>>> use it you should include the header yourself and cast
>>>>>>>>>>>>> whatever
>>>>>>>>>>>>> you
>>>>>>>>>>>>> need
>>>>>>>>>>>>> to get the pointer.
>>>>>>>>>>>>>
>>>>>>>>>>>>> if you're feeling generous, maybe you can extend the
>>>>>>>>>>>>> moses
>>>>>>>>>>>>> LM
>>>>>>>>>>>>> wrapper
>>>>>>>>>>>>> so
>>>>>>>>>>>>> that all LM implementations have the opportunity to
>>>>>>>>>>>>> return
>>>>>>>>>>>>> the
>>>>>>>>>>>>> length
>>>>>>>>>>>>> n-gram match.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 13/07/2011 21:51, Marc LEGENDRE wrote:
>>>>>>>>>>>>>> The length of the n-gram match is sufficient for I want,
>>>>>>>>>>>>>> indeed.
>>>>>>>>>>>>>> I figured out how to do get it using directly kenlm, but
>>>>>>>>>>>>>> as
>>>>>>>>>>>>>> I
>>>>>>>>>>>>>> am
>>>>>>>>>>>>>> running the decoder, I wanted to use the already loaded
>>>>>>>>>>>>>> LM.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I first tried to dig my way through the Moses
>>>>>>>>>>>>>> abstraction
>>>>>>>>>>>>>> layers
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>> retrieve a pointer to a lm::Model from kenlm, but the
>>>>>>>>>>>>>> Moses::LanguageModelKen header is not part of the public
>>>>>>>>>>>>>> headers
>>>>>>>>>>>>>> of
>>>>>>>>>>>>>> Moses ; that's why I tried to use only Moses interface.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> (I did I did not mention this alternative ; If someone
>>>>>>>>>>>>>> knows
>>>>>>>>>>>>>> how
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>> get such a pointer, I can carry on from there)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> ----- Mail original -----
>>>>>>>>>>>>>>> De: "Kenneth Heafield"<  [email protected]  >
>>>>>>>>>>>>>>> À: "Marc LEGENDRE"<
>>>>>>>>>>>>>>>  [email protected]
>>>>>>>>>>>>>>>  >
>>>>>>>>>>>>>>> Envoyé: Mercredi 13 Juillet 2011 16:12:27
>>>>>>>>>>>>>>> Objet: Re: [Moses-support] Using Moses language models
>>>>>>>>>>>>>>> The definition of unknown is that the word you asked
>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>> (the
>>>>>>>>>>>>>>> rightmost
>>>>>>>>>>>>>>> one) is mapped to<unk>  i.e. an OOV.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Are you looking for:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 1) Length of n-gram matched in the model
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> or
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2) Length of state you must keep for valid continuation
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> right
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> These are slightly different things due to state
>>>>>>>>>>>>>>> minimization.
>>>>>>>>>>>>>>> The
>>>>>>>>>>>>>>> moses abstraction layer does not return either in a
>>>>>>>>>>>>>>> general
>>>>>>>>>>>>>>> way.
>>>>>>>>>>>>>>> However, if you're using KenLM, #2 is in the returned
>>>>>>>>>>>>>>> state's
>>>>>>>>>>>>>>> valid_length_. Further, #1 is in
>>>>>>>>>>>>>>> FullScoreReturn.ngram_length.
>>>>>>>>>>>>>>> So
>>>>>>>>>>>>>>> if
>>>>>>>>>>>>>>> you call KenLM directly these are easy to obtain (and
>>>>>>>>>>>>>>> you
>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>> decide
>>>>>>>>>>>>>>> whether to expose them through the Moses abstraction
>>>>>>>>>>>>>>> layer).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Outside the decoder, you can run
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> kenlm/query model_file null
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> then provide your trigrams on stdin.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Here's an example with kenlm/query kenlm/lm/test.arpa
>>>>>>>>>>>>>>> null
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> looking on a
>>>>>>>>>>>>>>> looking=23 1 -1.28594 on=25 2 -0.46389 a=5 3 -0.0483513
>>>>>>>>>>>>>>> Total: -1.79818 OOV: 0
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The format is "word=vocab_id ngram_length score". So
>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>> trigram
>>>>>>>>>>>>>>> in the model because "a=5 3" appears.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 07/13/11 08:50, Marc LEGENDRE wrote:
>>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I am trying to use the language models loaded by Moses
>>>>>>>>>>>>>>>> ;
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I am using a 3-gram LM, and I need to know whether it
>>>>>>>>>>>>>>>> contains
>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>> given N-gram or not.
>>>>>>>>>>>>>>>> I tried to play around with
>>>>>>>>>>>>>>>> LanguageModelImplementation::GetValueForgotState(...),
>>>>>>>>>>>>>>>> but the boolean 'unknown' in the returned structure
>>>>>>>>>>>>>>>> does
>>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>> seem
>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>> be what I'm looking for.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Is there any simple way of getting this piece of
>>>>>>>>>>>>>>>> information
>>>>>>>>>>>>>>>> ?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>> Marc Legendre
>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>> Moses-support mailing list
>>>>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>> Moses-support mailing list
>>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> Moses-support mailing list
>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> Moses-support mailing list
>>>>>>>>>>>> [email protected]
>>>>>>>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> Moses-support mailing list
>>>>>>>>>>> [email protected]
>>>>>>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Moses-support mailing list
>>>>>>>>>> [email protected]
>>>>>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Moses-support mailing list
>>>>>>>>> [email protected]
>>>>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>>>>> _______________________________________________
>>>>>>>> Moses-support mailing list
>>>>>>>> [email protected]
>>>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Moses-support mailing list
>>>>>>>> [email protected]
>>>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>>>>>
>>> _______________________________________________
>>> Moses-support mailing list
>>> [email protected]
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>> _______________________________________________
>> Moses-support mailing list
>> [email protected]
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Using Moses language models

Reply via email to