Re: [Moses-support] Using Moses language models

Marc LEGENDRE Wed, 24 Aug 2011 02:18:44 -0700

Hi,

I merged the trunk into my branch; it looks ok.
May my little modification to LMKen.h/cpp be finally merged into the trunk ?
(not the useless changes to PhraseDictionaryTree)


Thanks, (And sorry for my low reactivity, I hope you remember me!)

Marc

----- Mail original -----
> De: "Hieu Hoang" <[email protected]>
> À: "Marc LEGENDRE" <[email protected]>
> Cc: "Kenneth Heafield" <[email protected]>, [email protected]
> Envoyé: Mercredi 27 Juillet 2011 13:34:35
> Objet: Re: [Moses-support] Using Moses language models
> 
> hi marc,
> 
> thx for the commits.
> 
> the regression test failed probably because the decoder wasn't
> compiled
> with SRI or IRST LM, which some of the regression test specify. I
> compiled your branch & it passes.
> 
> I suppose for convenience, we should change it to use KenLM, with
> specific tests for IRST & SRI.
> 
> On 25/07/2011 21:51, Marc LEGENDRE wrote:
> > Well, I actually commited in the augmLMResult branch.
> >
> > I inserted a class between LMKen and LMSingleFactor to prevent the
> > inclusion of kenlm headers.
> > (And yes, I now realize this may be the kind of things you write in
> > a commit message)
> >
> > Since the LanguageModelKen.h header now contains functions I want
> > to use,
> > can we add it to the list of the installed files ? (&&  How ? )
> >
> >
> > Also, I can't get the regression tests to work.
> > I downloaded the test data&&  extracted those in /tmp; I read what
> > I found, and this is the command I came up with
> > ./regression-testing/run-test-suite.pl
> > --decoder-phrase=moses-cmd/src/moses
> > --decoder-chart=moses-chart-cmd/src/moses_chart
> > But every test ends with a "MOSES CRASHED" message. (And the same
> > thing happens with the trunk build)
> > I tried to understand, and I noticed that .ini files for the tests
> > contain :
> > [lmodel-file]
> > 0 0 3 moses-reg-test-data-5/lm/europarl.en.srilm.gz
> >
> > Is that OK for kenlm ?
> >
> > Marc
> >
> > ----- Mail original -----
> >> De: "Kenneth Heafield"<[email protected]>
> >> À: "Marc LEGENDRE"<[email protected]>
> >> Cc:[email protected],[email protected]
> >> Envoyé: Vendredi 22 Juillet 2011 20:18:21
> >> Objet: Re: [Moses-support] Using Moses language models
> >>
> >> Hi Marc,
> >>
> >>    This sounds like a simple change, so a branch is probably too
> >>    much
> >> overhead.  Please do one of the following:
> >>
> >> 1. Send a patch as generated by diff -rupN $old $new .  Do a make
> >> clean
> >> first.
> >> 2. Attach the files you modified and send them, along with the
> >> revision
> >> you based changes on.
> >> 3. Make a branch (if you already did).
> >>
> >> Thanks,
> >>
> >> Kenneth
> >>
> >> On 07/22/11 04:21, Marc LEGENDRE wrote:
> >>> Well, we (me and the people I work with) were hoping not to have
> >>> to
> >>> maintain
> >>> a modified version of Moses.
> >>>
> >>> Luckily, obviousness just hit me like a truck : if something is
> >>> specific to a LM,
> >>> it does not have to be in the top layer.
> >>> Having a common interface does not prevent subclasses from having
> >>> a
> >>> specific behaviour,
> >>> we could have a LanguageModelKen method, say
> >>> GetValueForgotStateKen(...) which would return
> >>> something specific, say a LMKenResult, which would contain a
> >>> LMResult plus others things
> >>> like, say, a ngram_length field :-).
> >>> And the virtual GetValueForgotState() method would simply return
> >>> the LMResult from there.
> >>>
> >>> This way, no need to break the high level API,
> >>> and no extra maintenance cost for us (me and the peop... Well,
> >>> you
> >>> know).
> >>>
> >>> ----- Mail original -----
> >>>> De: "Hieu Hoang"<[email protected]>
> >>>> À: "Kenneth Heafield"<[email protected]>
> >>>> Cc:[email protected]
> >>>> Envoyé: Vendredi 22 Juillet 2011 04:50:14
> >>>> Objet: Re: [Moses-support] Using Moses language models
> >>>>
> >>>>
> >>>> true,&  there's no right answer to it.
> >>>>
> >>>> I suppose 1 goal of the trunk is to make sure that the core
> >>>> functionality of translating isn't affected too much, in terms
> >>>> of
> >>>> quality, speed, or memory. ANother goal is to make not to
> >>>> overburden
> >>>> the API with things no-one else uses or implement.
> >>>>
> >>>> therefore, i think a good strategy is to branch&  do what you
> >>>> like
> >>>>
> >>>>
> >>>> On 21 July 2011 22:46, Kenneth Heafield<  [email protected]  >
> >>>> wrote:
> >>>>
> >>>>
> >>>> Marc makes a good point. When one language model provides more
> >>>> information than do other language models, it's difficult to
> >>>> maintain
> >>>> a
> >>>> common abstraction layer. Currently we're looking at n-gram
> >>>> length.
> >>>> SRILM doesn't provide access to that (but you can get
> >>>> right-looking
> >>>> state length which is usually the same thing).
> >>>>
> >>>> I'm working on making this issue more severe with left-looking
> >>>> state
> >>>> optimization and explicit hypothesis bounds. How do we change
> >>>> the
> >>>> decoder to use these features if not all of the language models
> >>>> support
> >>>> them?
> >>>>
> >>>> Maybe another class in the language model hierarchy supporting
> >>>> these
> >>>> additional features. But it's going to make the decoder look
> >>>> ugly
> >>>> if
> >>>> you want to support both.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On 07/21/11 11:14, Hieu Hoang wrote:
> >>>>> hi marc,
> >>>>>
> >>>>> it'll be good for people to see your changes.
> >>>>>
> >>>>> i suppose you should create a branch and make your changes in
> >>>>> there.
> >>>>>
> >>>>> If there are other people interested, you can point them to
> >>>>> your
> >>>>> branch.
> >>>>> If more people are interested and it doesn't affect other
> >>>>> people
> >>>>> too
> >>>>> much, then we can move it to trunk.
> >>>>>
> >>>>> i'll email you offline with svn details
> >>>>>
> >>>>> On 21/07/2011 15:16, Marc LEGENDRE wrote:
> >>>>>> Alright, I gave this a try, and it did it for me.
> >>>>>> With kenlm, it is a ridiculously straightforward modification,
> >>>>>> but now I'm not sure how I can submit it :
> >>>>>> on one hand, I am not a "machine tranlation guy" and I don't
> >>>>>> imagine myself
> >>>>>> digging in every other LM to find how to set the ngram_length
> >>>>>> value;
> >>>>>> and on the other hand I would feel guilty to submit a 10-line
> >>>>>> patch and say
> >>>>>> "Guys, I need this, would you mind committing it and doing
> >>>>>> yourselves the
> >>>>>> necessary modifications in every other wrapper ?"
> >>>>>>
> >>>>>> How do you, Moses developers, feel about this ?
> >>>>>> Is it acceptable / outrageously stupid if I set the value to
> >>>>>> -1
> >>>>>> in
> >>>>>> the other wrappers,
> >>>>>> maybe with a TODO, and properly document it in the super class
> >>>>>> ?
> >>>>>>
> >>>>>> ----- Mail original -----
> >>>>>>> De: "Kenneth Heafield"<  [email protected]  >
> >>>>>>> À:[email protected]
> >>>>>>> Envoyé: Mercredi 13 Juillet 2011 20:53:46
> >>>>>>> Objet: Re: [Moses-support] Using Moses language models
> >>>>>>>
> >>>>>>> I'd suggest adding a ngram_length member to LMResult then
> >>>>>>> modifying
> >>>>>>> each
> >>>>>>> model's wrapper (or just mine) to set that value.
> >>>>>>>
> >>>>>>> You're welcome to move stuff from LanguageModelKen.cpp to
> >>>>>>> LanguageModelKen.h as necessary. I chose this setup to
> >>>>>>> minimize
> >>>>>>> unnecessary includes.
> >>>>>>>
> >>>>>>> Kenneth
> >>>>>>>
> >>>>>>> On 07/13/11 14:33, Marc LEGENDRE wrote:
> >>>>>>>> Well, not only the header is not "public", so to speak,
> >>>>>>>> (which
> >>>>>>>> I
> >>>>>>>> agree is not a major obstacle)
> >>>>>>>> but also the desired pointer is a private member of the
> >>>>>>>> class,
> >>>>>>>> and
> >>>>>>>> sadly lacks a getter.
> >>>>>>>> As far as I know, it means that accessing it will involve
> >>>>>>>> questionnable C++ tricks.
> >>>>>>>> (never tried, though)
> >>>>>>>>
> >>>>>>>> If modifying Moses is not too much of a chore, I'll give it
> >>>>>>>> a
> >>>>>>>> thought.
> >>>>>>>>
> >>>>>>>> Anyway, thank you for your answers.
> >>>>>>>>
> >>>>>>>> ----- Mail original -----
> >>>>>>>>> De: "Hieu Hoang"<  [email protected]  >
> >>>>>>>>> À:[email protected]
> >>>>>>>>> Envoyé: Mercredi 13 Juillet 2011 18:40:11
> >>>>>>>>> Objet: Re: [Moses-support] Using Moses language models
> >>>>>>>>> i guess lm::Model is specific to the ken lm implementation.
> >>>>>>>>> If
> >>>>>>>>> you
> >>>>>>>>> want
> >>>>>>>>> use it you should include the header yourself and cast
> >>>>>>>>> whatever
> >>>>>>>>> you
> >>>>>>>>> need
> >>>>>>>>> to get the pointer.
> >>>>>>>>>
> >>>>>>>>> if you're feeling generous, maybe you can extend the moses
> >>>>>>>>> LM
> >>>>>>>>> wrapper
> >>>>>>>>> so
> >>>>>>>>> that all LM implementations have the opportunity to return
> >>>>>>>>> the
> >>>>>>>>> length
> >>>>>>>>> n-gram match.
> >>>>>>>>>
> >>>>>>>>> On 13/07/2011 21:51, Marc LEGENDRE wrote:
> >>>>>>>>>> The length of the n-gram match is sufficient for I want,
> >>>>>>>>>> indeed.
> >>>>>>>>>> I figured out how to do get it using directly kenlm, but
> >>>>>>>>>> as
> >>>>>>>>>> I
> >>>>>>>>>> am
> >>>>>>>>>> running the decoder, I wanted to use the already loaded
> >>>>>>>>>> LM.
> >>>>>>>>>>
> >>>>>>>>>> I first tried to dig my way through the Moses abstraction
> >>>>>>>>>> layers
> >>>>>>>>>> to
> >>>>>>>>>> retrieve a pointer to a lm::Model from kenlm, but the
> >>>>>>>>>> Moses::LanguageModelKen header is not part of the public
> >>>>>>>>>> headers
> >>>>>>>>>> of
> >>>>>>>>>> Moses ; that's why I tried to use only Moses interface.
> >>>>>>>>>>
> >>>>>>>>>> (I did I did not mention this alternative ; If someone
> >>>>>>>>>> knows
> >>>>>>>>>> how
> >>>>>>>>>> to
> >>>>>>>>>> get such a pointer, I can carry on from there)
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> ----- Mail original -----
> >>>>>>>>>>> De: "Kenneth Heafield"<  [email protected]  >
> >>>>>>>>>>> À: "Marc LEGENDRE"<  [email protected]
> >>>>>>>>>>>  >
> >>>>>>>>>>> Envoyé: Mercredi 13 Juillet 2011 16:12:27
> >>>>>>>>>>> Objet: Re: [Moses-support] Using Moses language models
> >>>>>>>>>>> The definition of unknown is that the word you asked for
> >>>>>>>>>>> (the
> >>>>>>>>>>> rightmost
> >>>>>>>>>>> one) is mapped to<unk>  i.e. an OOV.
> >>>>>>>>>>>
> >>>>>>>>>>> Are you looking for:
> >>>>>>>>>>>
> >>>>>>>>>>> 1) Length of n-gram matched in the model
> >>>>>>>>>>>
> >>>>>>>>>>> or
> >>>>>>>>>>>
> >>>>>>>>>>> 2) Length of state you must keep for valid continuation
> >>>>>>>>>>> to
> >>>>>>>>>>> the
> >>>>>>>>>>> right
> >>>>>>>>>>>
> >>>>>>>>>>> These are slightly different things due to state
> >>>>>>>>>>> minimization.
> >>>>>>>>>>> The
> >>>>>>>>>>> moses abstraction layer does not return either in a
> >>>>>>>>>>> general
> >>>>>>>>>>> way.
> >>>>>>>>>>> However, if you're using KenLM, #2 is in the returned
> >>>>>>>>>>> state's
> >>>>>>>>>>> valid_length_. Further, #1 is in
> >>>>>>>>>>> FullScoreReturn.ngram_length.
> >>>>>>>>>>> So
> >>>>>>>>>>> if
> >>>>>>>>>>> you call KenLM directly these are easy to obtain (and you
> >>>>>>>>>>> can
> >>>>>>>>>>> decide
> >>>>>>>>>>> whether to expose them through the Moses abstraction
> >>>>>>>>>>> layer).
> >>>>>>>>>>>
> >>>>>>>>>>> Outside the decoder, you can run
> >>>>>>>>>>>
> >>>>>>>>>>> kenlm/query model_file null
> >>>>>>>>>>>
> >>>>>>>>>>> then provide your trigrams on stdin.
> >>>>>>>>>>>
> >>>>>>>>>>> Here's an example with kenlm/query kenlm/lm/test.arpa
> >>>>>>>>>>> null
> >>>>>>>>>>>
> >>>>>>>>>>> looking on a
> >>>>>>>>>>> looking=23 1 -1.28594 on=25 2 -0.46389 a=5 3 -0.0483513
> >>>>>>>>>>> Total: -1.79818 OOV: 0
> >>>>>>>>>>>
> >>>>>>>>>>> The format is "word=vocab_id ngram_length score". So this
> >>>>>>>>>>> is
> >>>>>>>>>>> a
> >>>>>>>>>>> trigram
> >>>>>>>>>>> in the model because "a=5 3" appears.
> >>>>>>>>>>>
> >>>>>>>>>>> On 07/13/11 08:50, Marc LEGENDRE wrote:
> >>>>>>>>>>>> Hello,
> >>>>>>>>>>>>
> >>>>>>>>>>>> I am trying to use the language models loaded by Moses ;
> >>>>>>>>>>>>
> >>>>>>>>>>>> I am using a 3-gram LM, and I need to know whether it
> >>>>>>>>>>>> contains
> >>>>>>>>>>>> a
> >>>>>>>>>>>> given N-gram or not.
> >>>>>>>>>>>> I tried to play around with
> >>>>>>>>>>>> LanguageModelImplementation::GetValueForgotState(...),
> >>>>>>>>>>>> but the boolean 'unknown' in the returned structure does
> >>>>>>>>>>>> not
> >>>>>>>>>>>> seem
> >>>>>>>>>>>> to
> >>>>>>>>>>>> be what I'm looking for.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Is there any simple way of getting this piece of
> >>>>>>>>>>>> information
> >>>>>>>>>>>> ?
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> Regards,
> >>>>>>>>>>>> Marc Legendre
> >>>>>>>>>>>> _______________________________________________
> >>>>>>>>>>>> Moses-support mailing list
> >>>>>>>>>>>> [email protected]
> >>>>>>>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
> >>>>>>>>>> _______________________________________________
> >>>>>>>>>> Moses-support mailing list
> >>>>>>>>>> [email protected]
> >>>>>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>> _______________________________________________
> >>>>>>>>> Moses-support mailing list
> >>>>>>>>> [email protected]
> >>>>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
> >>>>>>>> _______________________________________________
> >>>>>>>> Moses-support mailing list
> >>>>>>>> [email protected]
> >>>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
> >>>>>>> _______________________________________________
> >>>>>>> Moses-support mailing list
> >>>>>>> [email protected]
> >>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
> >>>>>>>
> >>>>>> _______________________________________________
> >>>>>> Moses-support mailing list
> >>>>>> [email protected]
> >>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
> >>>>>>
> >>>>>>
> >>>>> _______________________________________________
> >>>>> Moses-support mailing list
> >>>>> [email protected]
> >>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
> >>>> _______________________________________________
> >>>> Moses-support mailing list
> >>>> [email protected]
> >>>> http://mailman.mit.edu/mailman/listinfo/moses-support
> >>>>
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> Moses-support mailing list
> >>>> [email protected]
> >>>> http://mailman.mit.edu/mailman/listinfo/moses-support
> >>>>
> 

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Using Moses language models

Reply via email to