Re: [Moses-support] Using Moses language models

Hieu Hoang Wed, 27 Jul 2011 04:36:06 -0700

hi marc,

thx for the commits.


the regression test failed probably because the decoder wasn't compiled 
with SRI or IRST LM, which some of the regression test specify. I 
compiled your branch & it passes.

I suppose for convenience, we should change it to use KenLM, with 
specific tests for IRST & SRI.

On 25/07/2011 21:51, Marc LEGENDRE wrote:
> Well, I actually commited in the augmLMResult branch.
>
> I inserted a class between LMKen and LMSingleFactor to prevent the inclusion 
> of kenlm headers.
> (And yes, I now realize this may be the kind of things you write in a commit 
> message)
>
> Since the LanguageModelKen.h header now contains functions I want to use,
> can we add it to the list of the installed files ? (&&  How ? )
>
>
> Also, I can't get the regression tests to work.
> I downloaded the test data&&  extracted those in /tmp; I read what I found, 
> and this is the command I came up with
> ./regression-testing/run-test-suite.pl --decoder-phrase=moses-cmd/src/moses 
> --decoder-chart=moses-chart-cmd/src/moses_chart
> But every test ends with a "MOSES CRASHED" message. (And the same thing 
> happens with the trunk build)
> I tried to understand, and I noticed that .ini files for the tests contain :
> [lmodel-file]
> 0 0 3 moses-reg-test-data-5/lm/europarl.en.srilm.gz
>
> Is that OK for kenlm ?
>
> Marc
>
> ----- Mail original -----
>> De: "Kenneth Heafield"<[email protected]>
>> À: "Marc LEGENDRE"<[email protected]>
>> Cc:[email protected],[email protected]
>> Envoyé: Vendredi 22 Juillet 2011 20:18:21
>> Objet: Re: [Moses-support] Using Moses language models
>>
>> Hi Marc,
>>
>>      This sounds like a simple change, so a branch is probably too much
>> overhead.  Please do one of the following:
>>
>> 1. Send a patch as generated by diff -rupN $old $new .  Do a make
>> clean
>> first.
>> 2. Attach the files you modified and send them, along with the
>> revision
>> you based changes on.
>> 3. Make a branch (if you already did).
>>
>> Thanks,
>>
>> Kenneth
>>
>> On 07/22/11 04:21, Marc LEGENDRE wrote:
>>> Well, we (me and the people I work with) were hoping not to have to
>>> maintain
>>> a modified version of Moses.
>>>
>>> Luckily, obviousness just hit me like a truck : if something is
>>> specific to a LM,
>>> it does not have to be in the top layer.
>>> Having a common interface does not prevent subclasses from having a
>>> specific behaviour,
>>> we could have a LanguageModelKen method, say
>>> GetValueForgotStateKen(...) which would return
>>> something specific, say a LMKenResult, which would contain a
>>> LMResult plus others things
>>> like, say, a ngram_length field :-).
>>> And the virtual GetValueForgotState() method would simply return
>>> the LMResult from there.
>>>
>>> This way, no need to break the high level API,
>>> and no extra maintenance cost for us (me and the peop... Well, you
>>> know).
>>>
>>> ----- Mail original -----
>>>> De: "Hieu Hoang"<[email protected]>
>>>> À: "Kenneth Heafield"<[email protected]>
>>>> Cc:[email protected]
>>>> Envoyé: Vendredi 22 Juillet 2011 04:50:14
>>>> Objet: Re: [Moses-support] Using Moses language models
>>>>
>>>>
>>>> true,&  there's no right answer to it.
>>>>
>>>> I suppose 1 goal of the trunk is to make sure that the core
>>>> functionality of translating isn't affected too much, in terms of
>>>> quality, speed, or memory. ANother goal is to make not to
>>>> overburden
>>>> the API with things no-one else uses or implement.
>>>>
>>>> therefore, i think a good strategy is to branch&  do what you like
>>>>
>>>>
>>>> On 21 July 2011 22:46, Kenneth Heafield<  [email protected]  >
>>>> wrote:
>>>>
>>>>
>>>> Marc makes a good point. When one language model provides more
>>>> information than do other language models, it's difficult to
>>>> maintain
>>>> a
>>>> common abstraction layer. Currently we're looking at n-gram
>>>> length.
>>>> SRILM doesn't provide access to that (but you can get
>>>> right-looking
>>>> state length which is usually the same thing).
>>>>
>>>> I'm working on making this issue more severe with left-looking
>>>> state
>>>> optimization and explicit hypothesis bounds. How do we change the
>>>> decoder to use these features if not all of the language models
>>>> support
>>>> them?
>>>>
>>>> Maybe another class in the language model hierarchy supporting
>>>> these
>>>> additional features. But it's going to make the decoder look ugly
>>>> if
>>>> you want to support both.
>>>>
>>>>
>>>>
>>>>
>>>> On 07/21/11 11:14, Hieu Hoang wrote:
>>>>> hi marc,
>>>>>
>>>>> it'll be good for people to see your changes.
>>>>>
>>>>> i suppose you should create a branch and make your changes in
>>>>> there.
>>>>>
>>>>> If there are other people interested, you can point them to your
>>>>> branch.
>>>>> If more people are interested and it doesn't affect other people
>>>>> too
>>>>> much, then we can move it to trunk.
>>>>>
>>>>> i'll email you offline with svn details
>>>>>
>>>>> On 21/07/2011 15:16, Marc LEGENDRE wrote:
>>>>>> Alright, I gave this a try, and it did it for me.
>>>>>> With kenlm, it is a ridiculously straightforward modification,
>>>>>> but now I'm not sure how I can submit it :
>>>>>> on one hand, I am not a "machine tranlation guy" and I don't
>>>>>> imagine myself
>>>>>> digging in every other LM to find how to set the ngram_length
>>>>>> value;
>>>>>> and on the other hand I would feel guilty to submit a 10-line
>>>>>> patch and say
>>>>>> "Guys, I need this, would you mind committing it and doing
>>>>>> yourselves the
>>>>>> necessary modifications in every other wrapper ?"
>>>>>>
>>>>>> How do you, Moses developers, feel about this ?
>>>>>> Is it acceptable / outrageously stupid if I set the value to -1
>>>>>> in
>>>>>> the other wrappers,
>>>>>> maybe with a TODO, and properly document it in the super class ?
>>>>>>
>>>>>> ----- Mail original -----
>>>>>>> De: "Kenneth Heafield"<  [email protected]  >
>>>>>>> À:[email protected]
>>>>>>> Envoyé: Mercredi 13 Juillet 2011 20:53:46
>>>>>>> Objet: Re: [Moses-support] Using Moses language models
>>>>>>>
>>>>>>> I'd suggest adding a ngram_length member to LMResult then
>>>>>>> modifying
>>>>>>> each
>>>>>>> model's wrapper (or just mine) to set that value.
>>>>>>>
>>>>>>> You're welcome to move stuff from LanguageModelKen.cpp to
>>>>>>> LanguageModelKen.h as necessary. I chose this setup to minimize
>>>>>>> unnecessary includes.
>>>>>>>
>>>>>>> Kenneth
>>>>>>>
>>>>>>> On 07/13/11 14:33, Marc LEGENDRE wrote:
>>>>>>>> Well, not only the header is not "public", so to speak, (which
>>>>>>>> I
>>>>>>>> agree is not a major obstacle)
>>>>>>>> but also the desired pointer is a private member of the class,
>>>>>>>> and
>>>>>>>> sadly lacks a getter.
>>>>>>>> As far as I know, it means that accessing it will involve
>>>>>>>> questionnable C++ tricks.
>>>>>>>> (never tried, though)
>>>>>>>>
>>>>>>>> If modifying Moses is not too much of a chore, I'll give it a
>>>>>>>> thought.
>>>>>>>>
>>>>>>>> Anyway, thank you for your answers.
>>>>>>>>
>>>>>>>> ----- Mail original -----
>>>>>>>>> De: "Hieu Hoang"<  [email protected]  >
>>>>>>>>> À:[email protected]
>>>>>>>>> Envoyé: Mercredi 13 Juillet 2011 18:40:11
>>>>>>>>> Objet: Re: [Moses-support] Using Moses language models
>>>>>>>>> i guess lm::Model is specific to the ken lm implementation.
>>>>>>>>> If
>>>>>>>>> you
>>>>>>>>> want
>>>>>>>>> use it you should include the header yourself and cast
>>>>>>>>> whatever
>>>>>>>>> you
>>>>>>>>> need
>>>>>>>>> to get the pointer.
>>>>>>>>>
>>>>>>>>> if you're feeling generous, maybe you can extend the moses LM
>>>>>>>>> wrapper
>>>>>>>>> so
>>>>>>>>> that all LM implementations have the opportunity to return
>>>>>>>>> the
>>>>>>>>> length
>>>>>>>>> n-gram match.
>>>>>>>>>
>>>>>>>>> On 13/07/2011 21:51, Marc LEGENDRE wrote:
>>>>>>>>>> The length of the n-gram match is sufficient for I want,
>>>>>>>>>> indeed.
>>>>>>>>>> I figured out how to do get it using directly kenlm, but as
>>>>>>>>>> I
>>>>>>>>>> am
>>>>>>>>>> running the decoder, I wanted to use the already loaded LM.
>>>>>>>>>>
>>>>>>>>>> I first tried to dig my way through the Moses abstraction
>>>>>>>>>> layers
>>>>>>>>>> to
>>>>>>>>>> retrieve a pointer to a lm::Model from kenlm, but the
>>>>>>>>>> Moses::LanguageModelKen header is not part of the public
>>>>>>>>>> headers
>>>>>>>>>> of
>>>>>>>>>> Moses ; that's why I tried to use only Moses interface.
>>>>>>>>>>
>>>>>>>>>> (I did I did not mention this alternative ; If someone knows
>>>>>>>>>> how
>>>>>>>>>> to
>>>>>>>>>> get such a pointer, I can carry on from there)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ----- Mail original -----
>>>>>>>>>>> De: "Kenneth Heafield"<  [email protected]  >
>>>>>>>>>>> À: "Marc LEGENDRE"<  [email protected]  >
>>>>>>>>>>> Envoyé: Mercredi 13 Juillet 2011 16:12:27
>>>>>>>>>>> Objet: Re: [Moses-support] Using Moses language models
>>>>>>>>>>> The definition of unknown is that the word you asked for
>>>>>>>>>>> (the
>>>>>>>>>>> rightmost
>>>>>>>>>>> one) is mapped to<unk>  i.e. an OOV.
>>>>>>>>>>>
>>>>>>>>>>> Are you looking for:
>>>>>>>>>>>
>>>>>>>>>>> 1) Length of n-gram matched in the model
>>>>>>>>>>>
>>>>>>>>>>> or
>>>>>>>>>>>
>>>>>>>>>>> 2) Length of state you must keep for valid continuation to
>>>>>>>>>>> the
>>>>>>>>>>> right
>>>>>>>>>>>
>>>>>>>>>>> These are slightly different things due to state
>>>>>>>>>>> minimization.
>>>>>>>>>>> The
>>>>>>>>>>> moses abstraction layer does not return either in a general
>>>>>>>>>>> way.
>>>>>>>>>>> However, if you're using KenLM, #2 is in the returned
>>>>>>>>>>> state's
>>>>>>>>>>> valid_length_. Further, #1 is in
>>>>>>>>>>> FullScoreReturn.ngram_length.
>>>>>>>>>>> So
>>>>>>>>>>> if
>>>>>>>>>>> you call KenLM directly these are easy to obtain (and you
>>>>>>>>>>> can
>>>>>>>>>>> decide
>>>>>>>>>>> whether to expose them through the Moses abstraction
>>>>>>>>>>> layer).
>>>>>>>>>>>
>>>>>>>>>>> Outside the decoder, you can run
>>>>>>>>>>>
>>>>>>>>>>> kenlm/query model_file null
>>>>>>>>>>>
>>>>>>>>>>> then provide your trigrams on stdin.
>>>>>>>>>>>
>>>>>>>>>>> Here's an example with kenlm/query kenlm/lm/test.arpa null
>>>>>>>>>>>
>>>>>>>>>>> looking on a
>>>>>>>>>>> looking=23 1 -1.28594 on=25 2 -0.46389 a=5 3 -0.0483513
>>>>>>>>>>> Total: -1.79818 OOV: 0
>>>>>>>>>>>
>>>>>>>>>>> The format is "word=vocab_id ngram_length score". So this
>>>>>>>>>>> is
>>>>>>>>>>> a
>>>>>>>>>>> trigram
>>>>>>>>>>> in the model because "a=5 3" appears.
>>>>>>>>>>>
>>>>>>>>>>> On 07/13/11 08:50, Marc LEGENDRE wrote:
>>>>>>>>>>>> Hello,
>>>>>>>>>>>>
>>>>>>>>>>>> I am trying to use the language models loaded by Moses ;
>>>>>>>>>>>>
>>>>>>>>>>>> I am using a 3-gram LM, and I need to know whether it
>>>>>>>>>>>> contains
>>>>>>>>>>>> a
>>>>>>>>>>>> given N-gram or not.
>>>>>>>>>>>> I tried to play around with
>>>>>>>>>>>> LanguageModelImplementation::GetValueForgotState(...),
>>>>>>>>>>>> but the boolean 'unknown' in the returned structure does
>>>>>>>>>>>> not
>>>>>>>>>>>> seem
>>>>>>>>>>>> to
>>>>>>>>>>>> be what I'm looking for.
>>>>>>>>>>>>
>>>>>>>>>>>> Is there any simple way of getting this piece of
>>>>>>>>>>>> information
>>>>>>>>>>>> ?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Regards,
>>>>>>>>>>>> Marc Legendre
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> Moses-support mailing list
>>>>>>>>>>>> [email protected]
>>>>>>>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Moses-support mailing list
>>>>>>>>>> [email protected]
>>>>>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Moses-support mailing list
>>>>>>>>> [email protected]
>>>>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>>>>> _______________________________________________
>>>>>>>> Moses-support mailing list
>>>>>>>> [email protected]
>>>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>>>> _______________________________________________
>>>>>>> Moses-support mailing list
>>>>>>> [email protected]
>>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>>>>
>>>>>> _______________________________________________
>>>>>> Moses-support mailing list
>>>>>> [email protected]
>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>>>
>>>>>>
>>>>> _______________________________________________
>>>>> Moses-support mailing list
>>>>> [email protected]
>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>> _______________________________________________
>>>> Moses-support mailing list
>>>> [email protected]
>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Moses-support mailing list
>>>> [email protected]
>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Using Moses language models

Reply via email to