Re: [Moses-support] Using Moses language models

Marc LEGENDRE Mon, 25 Jul 2011 07:52:02 -0700

Well, I actually commited in the augmLMResult branch.

I inserted a class between LMKen and LMSingleFactor to prevent the inclusion of 
kenlm headers.
(And yes, I now realize this may be the kind of things you write in a commit 
message)


Since the LanguageModelKen.h header now contains functions I want to use,
can we add it to the list of the installed files ? ( && How ? )


Also, I can't get the regression tests to work.
I downloaded the test data && extracted those in /tmp; I read what I found, and 
this is the command I came up with
./regression-testing/run-test-suite.pl --decoder-phrase=moses-cmd/src/moses 
--decoder-chart=moses-chart-cmd/src/moses_chart
But every test ends with a "MOSES CRASHED" message. (And the same thing happens 
with the trunk build)
I tried to understand, and I noticed that .ini files for the tests contain :
[lmodel-file]
0 0 3 moses-reg-test-data-5/lm/europarl.en.srilm.gz

Is that OK for kenlm ?

Marc

----- Mail original -----
> De: "Kenneth Heafield" <[email protected]>
> À: "Marc LEGENDRE" <[email protected]>
> Cc: [email protected], [email protected]
> Envoyé: Vendredi 22 Juillet 2011 20:18:21
> Objet: Re: [Moses-support] Using Moses language models
> 
> Hi Marc,
> 
>       This sounds like a simple change, so a branch is probably too much
> overhead.  Please do one of the following:
> 
> 1. Send a patch as generated by diff -rupN $old $new .  Do a make
> clean
> first.
> 2. Attach the files you modified and send them, along with the
> revision
> you based changes on.
> 3. Make a branch (if you already did).
> 
> Thanks,
> 
> Kenneth
> 
> On 07/22/11 04:21, Marc LEGENDRE wrote:
> > Well, we (me and the people I work with) were hoping not to have to
> > maintain
> > a modified version of Moses.
> > 
> > Luckily, obviousness just hit me like a truck : if something is
> > specific to a LM,
> > it does not have to be in the top layer.
> > Having a common interface does not prevent subclasses from having a
> > specific behaviour,
> > we could have a LanguageModelKen method, say
> > GetValueForgotStateKen(...) which would return
> > something specific, say a LMKenResult, which would contain a
> > LMResult plus others things
> > like, say, a ngram_length field :-).
> > And the virtual GetValueForgotState() method would simply return
> > the LMResult from there.
> > 
> > This way, no need to break the high level API,
> > and no extra maintenance cost for us (me and the peop... Well, you
> > know).
> > 
> > ----- Mail original -----
> >> De: "Hieu Hoang" <[email protected]>
> >> À: "Kenneth Heafield" <[email protected]>
> >> Cc: [email protected]
> >> Envoyé: Vendredi 22 Juillet 2011 04:50:14
> >> Objet: Re: [Moses-support] Using Moses language models
> >>
> >>
> >> true, & there's no right answer to it.
> >>
> >> I suppose 1 goal of the trunk is to make sure that the core
> >> functionality of translating isn't affected too much, in terms of
> >> quality, speed, or memory. ANother goal is to make not to
> >> overburden
> >> the API with things no-one else uses or implement.
> >>
> >> therefore, i think a good strategy is to branch & do what you like
> >>
> >>
> >> On 21 July 2011 22:46, Kenneth Heafield < [email protected] >
> >> wrote:
> >>
> >>
> >> Marc makes a good point. When one language model provides more
> >> information than do other language models, it's difficult to
> >> maintain
> >> a
> >> common abstraction layer. Currently we're looking at n-gram
> >> length.
> >> SRILM doesn't provide access to that (but you can get
> >> right-looking
> >> state length which is usually the same thing).
> >>
> >> I'm working on making this issue more severe with left-looking
> >> state
> >> optimization and explicit hypothesis bounds. How do we change the
> >> decoder to use these features if not all of the language models
> >> support
> >> them?
> >>
> >> Maybe another class in the language model hierarchy supporting
> >> these
> >> additional features. But it's going to make the decoder look ugly
> >> if
> >> you want to support both.
> >>
> >>
> >>
> >>
> >> On 07/21/11 11:14, Hieu Hoang wrote:
> >>> hi marc,
> >>>
> >>> it'll be good for people to see your changes.
> >>>
> >>> i suppose you should create a branch and make your changes in
> >>> there.
> >>>
> >>> If there are other people interested, you can point them to your
> >>> branch.
> >>> If more people are interested and it doesn't affect other people
> >>> too
> >>> much, then we can move it to trunk.
> >>>
> >>> i'll email you offline with svn details
> >>>
> >>> On 21/07/2011 15:16, Marc LEGENDRE wrote:
> >>>> Alright, I gave this a try, and it did it for me.
> >>>> With kenlm, it is a ridiculously straightforward modification,
> >>>> but now I'm not sure how I can submit it :
> >>>> on one hand, I am not a "machine tranlation guy" and I don't
> >>>> imagine myself
> >>>> digging in every other LM to find how to set the ngram_length
> >>>> value;
> >>>> and on the other hand I would feel guilty to submit a 10-line
> >>>> patch and say
> >>>> "Guys, I need this, would you mind committing it and doing
> >>>> yourselves the
> >>>> necessary modifications in every other wrapper ?"
> >>>>
> >>>> How do you, Moses developers, feel about this ?
> >>>> Is it acceptable / outrageously stupid if I set the value to -1
> >>>> in
> >>>> the other wrappers,
> >>>> maybe with a TODO, and properly document it in the super class ?
> >>>>
> >>>> ----- Mail original -----
> >>>>> De: "Kenneth Heafield"< [email protected] >
> >>>>> À: [email protected]
> >>>>> Envoyé: Mercredi 13 Juillet 2011 20:53:46
> >>>>> Objet: Re: [Moses-support] Using Moses language models
> >>>>>
> >>>>> I'd suggest adding a ngram_length member to LMResult then
> >>>>> modifying
> >>>>> each
> >>>>> model's wrapper (or just mine) to set that value.
> >>>>>
> >>>>> You're welcome to move stuff from LanguageModelKen.cpp to
> >>>>> LanguageModelKen.h as necessary. I chose this setup to minimize
> >>>>> unnecessary includes.
> >>>>>
> >>>>> Kenneth
> >>>>>
> >>>>> On 07/13/11 14:33, Marc LEGENDRE wrote:
> >>>>>> Well, not only the header is not "public", so to speak, (which
> >>>>>> I
> >>>>>> agree is not a major obstacle)
> >>>>>> but also the desired pointer is a private member of the class,
> >>>>>> and
> >>>>>> sadly lacks a getter.
> >>>>>> As far as I know, it means that accessing it will involve
> >>>>>> questionnable C++ tricks.
> >>>>>> (never tried, though)
> >>>>>>
> >>>>>> If modifying Moses is not too much of a chore, I'll give it a
> >>>>>> thought.
> >>>>>>
> >>>>>> Anyway, thank you for your answers.
> >>>>>>
> >>>>>> ----- Mail original -----
> >>>>>>> De: "Hieu Hoang"< [email protected] >
> >>>>>>> À: [email protected]
> >>>>>>> Envoyé: Mercredi 13 Juillet 2011 18:40:11
> >>>>>>> Objet: Re: [Moses-support] Using Moses language models
> >>>>>>> i guess lm::Model is specific to the ken lm implementation.
> >>>>>>> If
> >>>>>>> you
> >>>>>>> want
> >>>>>>> use it you should include the header yourself and cast
> >>>>>>> whatever
> >>>>>>> you
> >>>>>>> need
> >>>>>>> to get the pointer.
> >>>>>>>
> >>>>>>> if you're feeling generous, maybe you can extend the moses LM
> >>>>>>> wrapper
> >>>>>>> so
> >>>>>>> that all LM implementations have the opportunity to return
> >>>>>>> the
> >>>>>>> length
> >>>>>>> n-gram match.
> >>>>>>>
> >>>>>>> On 13/07/2011 21:51, Marc LEGENDRE wrote:
> >>>>>>>> The length of the n-gram match is sufficient for I want,
> >>>>>>>> indeed.
> >>>>>>>> I figured out how to do get it using directly kenlm, but as
> >>>>>>>> I
> >>>>>>>> am
> >>>>>>>> running the decoder, I wanted to use the already loaded LM.
> >>>>>>>>
> >>>>>>>> I first tried to dig my way through the Moses abstraction
> >>>>>>>> layers
> >>>>>>>> to
> >>>>>>>> retrieve a pointer to a lm::Model from kenlm, but the
> >>>>>>>> Moses::LanguageModelKen header is not part of the public
> >>>>>>>> headers
> >>>>>>>> of
> >>>>>>>> Moses ; that's why I tried to use only Moses interface.
> >>>>>>>>
> >>>>>>>> (I did I did not mention this alternative ; If someone knows
> >>>>>>>> how
> >>>>>>>> to
> >>>>>>>> get such a pointer, I can carry on from there)
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> ----- Mail original -----
> >>>>>>>>> De: "Kenneth Heafield"< [email protected] >
> >>>>>>>>> À: "Marc LEGENDRE"< [email protected] >
> >>>>>>>>> Envoyé: Mercredi 13 Juillet 2011 16:12:27
> >>>>>>>>> Objet: Re: [Moses-support] Using Moses language models
> >>>>>>>>> The definition of unknown is that the word you asked for
> >>>>>>>>> (the
> >>>>>>>>> rightmost
> >>>>>>>>> one) is mapped to<unk> i.e. an OOV.
> >>>>>>>>>
> >>>>>>>>> Are you looking for:
> >>>>>>>>>
> >>>>>>>>> 1) Length of n-gram matched in the model
> >>>>>>>>>
> >>>>>>>>> or
> >>>>>>>>>
> >>>>>>>>> 2) Length of state you must keep for valid continuation to
> >>>>>>>>> the
> >>>>>>>>> right
> >>>>>>>>>
> >>>>>>>>> These are slightly different things due to state
> >>>>>>>>> minimization.
> >>>>>>>>> The
> >>>>>>>>> moses abstraction layer does not return either in a general
> >>>>>>>>> way.
> >>>>>>>>> However, if you're using KenLM, #2 is in the returned
> >>>>>>>>> state's
> >>>>>>>>> valid_length_. Further, #1 is in
> >>>>>>>>> FullScoreReturn.ngram_length.
> >>>>>>>>> So
> >>>>>>>>> if
> >>>>>>>>> you call KenLM directly these are easy to obtain (and you
> >>>>>>>>> can
> >>>>>>>>> decide
> >>>>>>>>> whether to expose them through the Moses abstraction
> >>>>>>>>> layer).
> >>>>>>>>>
> >>>>>>>>> Outside the decoder, you can run
> >>>>>>>>>
> >>>>>>>>> kenlm/query model_file null
> >>>>>>>>>
> >>>>>>>>> then provide your trigrams on stdin.
> >>>>>>>>>
> >>>>>>>>> Here's an example with kenlm/query kenlm/lm/test.arpa null
> >>>>>>>>>
> >>>>>>>>> looking on a
> >>>>>>>>> looking=23 1 -1.28594 on=25 2 -0.46389 a=5 3 -0.0483513
> >>>>>>>>> Total: -1.79818 OOV: 0
> >>>>>>>>>
> >>>>>>>>> The format is "word=vocab_id ngram_length score". So this
> >>>>>>>>> is
> >>>>>>>>> a
> >>>>>>>>> trigram
> >>>>>>>>> in the model because "a=5 3" appears.
> >>>>>>>>>
> >>>>>>>>> On 07/13/11 08:50, Marc LEGENDRE wrote:
> >>>>>>>>>> Hello,
> >>>>>>>>>>
> >>>>>>>>>> I am trying to use the language models loaded by Moses ;
> >>>>>>>>>>
> >>>>>>>>>> I am using a 3-gram LM, and I need to know whether it
> >>>>>>>>>> contains
> >>>>>>>>>> a
> >>>>>>>>>> given N-gram or not.
> >>>>>>>>>> I tried to play around with
> >>>>>>>>>> LanguageModelImplementation::GetValueForgotState(...),
> >>>>>>>>>> but the boolean 'unknown' in the returned structure does
> >>>>>>>>>> not
> >>>>>>>>>> seem
> >>>>>>>>>> to
> >>>>>>>>>> be what I'm looking for.
> >>>>>>>>>>
> >>>>>>>>>> Is there any simple way of getting this piece of
> >>>>>>>>>> information
> >>>>>>>>>> ?
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Regards,
> >>>>>>>>>> Marc Legendre
> >>>>>>>>>> _______________________________________________
> >>>>>>>>>> Moses-support mailing list
> >>>>>>>>>> [email protected]
> >>>>>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
> >>>>>>>> _______________________________________________
> >>>>>>>> Moses-support mailing list
> >>>>>>>> [email protected]
> >>>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
> >>>>>>>>
> >>>>>>>>
> >>>>>>> _______________________________________________
> >>>>>>> Moses-support mailing list
> >>>>>>> [email protected]
> >>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
> >>>>>> _______________________________________________
> >>>>>> Moses-support mailing list
> >>>>>> [email protected]
> >>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
> >>>>> _______________________________________________
> >>>>> Moses-support mailing list
> >>>>> [email protected]
> >>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
> >>>>>
> >>>> _______________________________________________
> >>>> Moses-support mailing list
> >>>> [email protected]
> >>>> http://mailman.mit.edu/mailman/listinfo/moses-support
> >>>>
> >>>>
> >>> _______________________________________________
> >>> Moses-support mailing list
> >>> [email protected]
> >>> http://mailman.mit.edu/mailman/listinfo/moses-support
> >> _______________________________________________
> >> Moses-support mailing list
> >> [email protected]
> >> http://mailman.mit.edu/mailman/listinfo/moses-support
> >>
> >>
> >>
> >> _______________________________________________
> >> Moses-support mailing list
> >> [email protected]
> >> http://mailman.mit.edu/mailman/listinfo/moses-support
> >>
> 

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Using Moses language models

Reply via email to