Re: [Moses-support] Using Moses language models

Marc LEGENDRE Wed, 24 Aug 2011 03:21:03 -0700

Yes I understood this from another discussion.
The point in PhraseDictionaryTree.cpp was just memory management.
(admitedly, to silence Valgrind ; but hey, don't we all strive for perfection ? 
:-)


I don't need this, I guess I should have removed it from my branch if I wanted 
to merge.
It's done.

----- Mail original -----
> De: "Kenneth Heafield" <[email protected]>
> À: [email protected]
> Envoyé: Mercredi 24 Août 2011 11:52:19
> Objet: Re: [Moses-support] Using Moses language models
>
> I support depending on Boost but sadly some people don't.
> PhraseDictionaryTree.cpp:3 in your branch includes a boost header.
>
> Kenneth
>
> On 08/24/11 10:17, Marc LEGENDRE wrote:
> > Hi,
> >
> > I merged the trunk into my branch; it looks ok.
> > May my little modification to LMKen.h/cpp be finally merged into
> > the trunk ?
> > (not the useless changes to PhraseDictionaryTree)
> >
> > Thanks, (And sorry for my low reactivity, I hope you remember me!)
> >
> > Marc
> >
> > ----- Mail original -----
> >> De: "Hieu Hoang" <[email protected]>
> >> À: "Marc LEGENDRE" <[email protected]>
> >> Cc: "Kenneth Heafield" <[email protected]>,
> >> [email protected]
> >> Envoyé: Mercredi 27 Juillet 2011 13:34:35
> >> Objet: Re: [Moses-support] Using Moses language models
> >>
> >> hi marc,
> >>
> >> thx for the commits.
> >>
> >> the regression test failed probably because the decoder wasn't
> >> compiled
> >> with SRI or IRST LM, which some of the regression test specify. I
> >> compiled your branch & it passes.
> >>
> >> I suppose for convenience, we should change it to use KenLM, with
> >> specific tests for IRST & SRI.
> >>
> >> On 25/07/2011 21:51, Marc LEGENDRE wrote:
> >>> Well, I actually commited in the augmLMResult branch.
> >>>
> >>> I inserted a class between LMKen and LMSingleFactor to prevent
> >>> the
> >>> inclusion of kenlm headers.
> >>> (And yes, I now realize this may be the kind of things you write
> >>> in
> >>> a commit message)
> >>>
> >>> Since the LanguageModelKen.h header now contains functions I want
> >>> to use,
> >>> can we add it to the list of the installed files ? (&&  How ? )
> >>>
> >>>
> >>> Also, I can't get the regression tests to work.
> >>> I downloaded the test data&&  extracted those in /tmp; I read
> >>> what
> >>> I found, and this is the command I came up with
> >>> ./regression-testing/run-test-suite.pl
> >>> --decoder-phrase=moses-cmd/src/moses
> >>> --decoder-chart=moses-chart-cmd/src/moses_chart
> >>> But every test ends with a "MOSES CRASHED" message. (And the same
> >>> thing happens with the trunk build)
> >>> I tried to understand, and I noticed that .ini files for the
> >>> tests
> >>> contain :
> >>> [lmodel-file]
> >>> 0 0 3 moses-reg-test-data-5/lm/europarl.en.srilm.gz
> >>>
> >>> Is that OK for kenlm ?
> >>>
> >>> Marc
> >>>
> >>> ----- Mail original -----
> >>>> De: "Kenneth Heafield"<[email protected]>
> >>>> À: "Marc LEGENDRE"<[email protected]>
> >>>> Cc:[email protected],[email protected]
> >>>> Envoyé: Vendredi 22 Juillet 2011 20:18:21
> >>>> Objet: Re: [Moses-support] Using Moses language models
> >>>>
> >>>> Hi Marc,
> >>>>
> >>>>         This sounds like a simple change, so a branch is probably too
> >>>>         much
> >>>> overhead.  Please do one of the following:
> >>>>
> >>>> 1. Send a patch as generated by diff -rupN $old $new .  Do a
> >>>> make
> >>>> clean
> >>>> first.
> >>>> 2. Attach the files you modified and send them, along with the
> >>>> revision
> >>>> you based changes on.
> >>>> 3. Make a branch (if you already did).
> >>>>
> >>>> Thanks,
> >>>>
> >>>> Kenneth
> >>>>
> >>>> On 07/22/11 04:21, Marc LEGENDRE wrote:
> >>>>> Well, we (me and the people I work with) were hoping not to
> >>>>> have
> >>>>> to
> >>>>> maintain
> >>>>> a modified version of Moses.
> >>>>>
> >>>>> Luckily, obviousness just hit me like a truck : if something is
> >>>>> specific to a LM,
> >>>>> it does not have to be in the top layer.
> >>>>> Having a common interface does not prevent subclasses from
> >>>>> having
> >>>>> a
> >>>>> specific behaviour,
> >>>>> we could have a LanguageModelKen method, say
> >>>>> GetValueForgotStateKen(...) which would return
> >>>>> something specific, say a LMKenResult, which would contain a
> >>>>> LMResult plus others things
> >>>>> like, say, a ngram_length field :-).
> >>>>> And the virtual GetValueForgotState() method would simply
> >>>>> return
> >>>>> the LMResult from there.
> >>>>>
> >>>>> This way, no need to break the high level API,
> >>>>> and no extra maintenance cost for us (me and the peop... Well,
> >>>>> you
> >>>>> know).
> >>>>>
> >>>>> ----- Mail original -----
> >>>>>> De: "Hieu Hoang"<[email protected]>
> >>>>>> À: "Kenneth Heafield"<[email protected]>
> >>>>>> Cc:[email protected]
> >>>>>> Envoyé: Vendredi 22 Juillet 2011 04:50:14
> >>>>>> Objet: Re: [Moses-support] Using Moses language models
> >>>>>>
> >>>>>>
> >>>>>> true,&  there's no right answer to it.
> >>>>>>
> >>>>>> I suppose 1 goal of the trunk is to make sure that the core
> >>>>>> functionality of translating isn't affected too much, in terms
> >>>>>> of
> >>>>>> quality, speed, or memory. ANother goal is to make not to
> >>>>>> overburden
> >>>>>> the API with things no-one else uses or implement.
> >>>>>>
> >>>>>> therefore, i think a good strategy is to branch&  do what you
> >>>>>> like
> >>>>>>
> >>>>>>
> >>>>>> On 21 July 2011 22:46, Kenneth Heafield<  [email protected]
> >>>>>>  >
> >>>>>> wrote:
> >>>>>>
> >>>>>>
> >>>>>> Marc makes a good point. When one language model provides more
> >>>>>> information than do other language models, it's difficult to
> >>>>>> maintain
> >>>>>> a
> >>>>>> common abstraction layer. Currently we're looking at n-gram
> >>>>>> length.
> >>>>>> SRILM doesn't provide access to that (but you can get
> >>>>>> right-looking
> >>>>>> state length which is usually the same thing).
> >>>>>>
> >>>>>> I'm working on making this issue more severe with left-looking
> >>>>>> state
> >>>>>> optimization and explicit hypothesis bounds. How do we change
> >>>>>> the
> >>>>>> decoder to use these features if not all of the language
> >>>>>> models
> >>>>>> support
> >>>>>> them?
> >>>>>>
> >>>>>> Maybe another class in the language model hierarchy supporting
> >>>>>> these
> >>>>>> additional features. But it's going to make the decoder look
> >>>>>> ugly
> >>>>>> if
> >>>>>> you want to support both.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On 07/21/11 11:14, Hieu Hoang wrote:
> >>>>>>> hi marc,
> >>>>>>>
> >>>>>>> it'll be good for people to see your changes.
> >>>>>>>
> >>>>>>> i suppose you should create a branch and make your changes in
> >>>>>>> there.
> >>>>>>>
> >>>>>>> If there are other people interested, you can point them to
> >>>>>>> your
> >>>>>>> branch.
> >>>>>>> If more people are interested and it doesn't affect other
> >>>>>>> people
> >>>>>>> too
> >>>>>>> much, then we can move it to trunk.
> >>>>>>>
> >>>>>>> i'll email you offline with svn details
> >>>>>>>
> >>>>>>> On 21/07/2011 15:16, Marc LEGENDRE wrote:
> >>>>>>>> Alright, I gave this a try, and it did it for me.
> >>>>>>>> With kenlm, it is a ridiculously straightforward
> >>>>>>>> modification,
> >>>>>>>> but now I'm not sure how I can submit it :
> >>>>>>>> on one hand, I am not a "machine tranlation guy" and I don't
> >>>>>>>> imagine myself
> >>>>>>>> digging in every other LM to find how to set the
> >>>>>>>> ngram_length
> >>>>>>>> value;
> >>>>>>>> and on the other hand I would feel guilty to submit a
> >>>>>>>> 10-line
> >>>>>>>> patch and say
> >>>>>>>> "Guys, I need this, would you mind committing it and doing
> >>>>>>>> yourselves the
> >>>>>>>> necessary modifications in every other wrapper ?"
> >>>>>>>>
> >>>>>>>> How do you, Moses developers, feel about this ?
> >>>>>>>> Is it acceptable / outrageously stupid if I set the value to
> >>>>>>>> -1
> >>>>>>>> in
> >>>>>>>> the other wrappers,
> >>>>>>>> maybe with a TODO, and properly document it in the super
> >>>>>>>> class
> >>>>>>>> ?
> >>>>>>>>
> >>>>>>>> ----- Mail original -----
> >>>>>>>>> De: "Kenneth Heafield"<  [email protected]  >
> >>>>>>>>> À:[email protected]
> >>>>>>>>> Envoyé: Mercredi 13 Juillet 2011 20:53:46
> >>>>>>>>> Objet: Re: [Moses-support] Using Moses language models
> >>>>>>>>>
> >>>>>>>>> I'd suggest adding a ngram_length member to LMResult then
> >>>>>>>>> modifying
> >>>>>>>>> each
> >>>>>>>>> model's wrapper (or just mine) to set that value.
> >>>>>>>>>
> >>>>>>>>> You're welcome to move stuff from LanguageModelKen.cpp to
> >>>>>>>>> LanguageModelKen.h as necessary. I chose this setup to
> >>>>>>>>> minimize
> >>>>>>>>> unnecessary includes.
> >>>>>>>>>
> >>>>>>>>> Kenneth
> >>>>>>>>>
> >>>>>>>>> On 07/13/11 14:33, Marc LEGENDRE wrote:
> >>>>>>>>>> Well, not only the header is not "public", so to speak,
> >>>>>>>>>> (which
> >>>>>>>>>> I
> >>>>>>>>>> agree is not a major obstacle)
> >>>>>>>>>> but also the desired pointer is a private member of the
> >>>>>>>>>> class,
> >>>>>>>>>> and
> >>>>>>>>>> sadly lacks a getter.
> >>>>>>>>>> As far as I know, it means that accessing it will involve
> >>>>>>>>>> questionnable C++ tricks.
> >>>>>>>>>> (never tried, though)
> >>>>>>>>>>
> >>>>>>>>>> If modifying Moses is not too much of a chore, I'll give
> >>>>>>>>>> it
> >>>>>>>>>> a
> >>>>>>>>>> thought.
> >>>>>>>>>>
> >>>>>>>>>> Anyway, thank you for your answers.
> >>>>>>>>>>
> >>>>>>>>>> ----- Mail original -----
> >>>>>>>>>>> De: "Hieu Hoang"<  [email protected]  >
> >>>>>>>>>>> À:[email protected]
> >>>>>>>>>>> Envoyé: Mercredi 13 Juillet 2011 18:40:11
> >>>>>>>>>>> Objet: Re: [Moses-support] Using Moses language models
> >>>>>>>>>>> i guess lm::Model is specific to the ken lm
> >>>>>>>>>>> implementation.
> >>>>>>>>>>> If
> >>>>>>>>>>> you
> >>>>>>>>>>> want
> >>>>>>>>>>> use it you should include the header yourself and cast
> >>>>>>>>>>> whatever
> >>>>>>>>>>> you
> >>>>>>>>>>> need
> >>>>>>>>>>> to get the pointer.
> >>>>>>>>>>>
> >>>>>>>>>>> if you're feeling generous, maybe you can extend the
> >>>>>>>>>>> moses
> >>>>>>>>>>> LM
> >>>>>>>>>>> wrapper
> >>>>>>>>>>> so
> >>>>>>>>>>> that all LM implementations have the opportunity to
> >>>>>>>>>>> return
> >>>>>>>>>>> the
> >>>>>>>>>>> length
> >>>>>>>>>>> n-gram match.
> >>>>>>>>>>>
> >>>>>>>>>>> On 13/07/2011 21:51, Marc LEGENDRE wrote:
> >>>>>>>>>>>> The length of the n-gram match is sufficient for I want,
> >>>>>>>>>>>> indeed.
> >>>>>>>>>>>> I figured out how to do get it using directly kenlm, but
> >>>>>>>>>>>> as
> >>>>>>>>>>>> I
> >>>>>>>>>>>> am
> >>>>>>>>>>>> running the decoder, I wanted to use the already loaded
> >>>>>>>>>>>> LM.
> >>>>>>>>>>>>
> >>>>>>>>>>>> I first tried to dig my way through the Moses
> >>>>>>>>>>>> abstraction
> >>>>>>>>>>>> layers
> >>>>>>>>>>>> to
> >>>>>>>>>>>> retrieve a pointer to a lm::Model from kenlm, but the
> >>>>>>>>>>>> Moses::LanguageModelKen header is not part of the public
> >>>>>>>>>>>> headers
> >>>>>>>>>>>> of
> >>>>>>>>>>>> Moses ; that's why I tried to use only Moses interface.
> >>>>>>>>>>>>
> >>>>>>>>>>>> (I did I did not mention this alternative ; If someone
> >>>>>>>>>>>> knows
> >>>>>>>>>>>> how
> >>>>>>>>>>>> to
> >>>>>>>>>>>> get such a pointer, I can carry on from there)
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> ----- Mail original -----
> >>>>>>>>>>>>> De: "Kenneth Heafield"<  [email protected]  >
> >>>>>>>>>>>>> À: "Marc LEGENDRE"<
> >>>>>>>>>>>>>  [email protected]
> >>>>>>>>>>>>>  >
> >>>>>>>>>>>>> Envoyé: Mercredi 13 Juillet 2011 16:12:27
> >>>>>>>>>>>>> Objet: Re: [Moses-support] Using Moses language models
> >>>>>>>>>>>>> The definition of unknown is that the word you asked
> >>>>>>>>>>>>> for
> >>>>>>>>>>>>> (the
> >>>>>>>>>>>>> rightmost
> >>>>>>>>>>>>> one) is mapped to<unk>  i.e. an OOV.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Are you looking for:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 1) Length of n-gram matched in the model
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> or
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 2) Length of state you must keep for valid continuation
> >>>>>>>>>>>>> to
> >>>>>>>>>>>>> the
> >>>>>>>>>>>>> right
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> These are slightly different things due to state
> >>>>>>>>>>>>> minimization.
> >>>>>>>>>>>>> The
> >>>>>>>>>>>>> moses abstraction layer does not return either in a
> >>>>>>>>>>>>> general
> >>>>>>>>>>>>> way.
> >>>>>>>>>>>>> However, if you're using KenLM, #2 is in the returned
> >>>>>>>>>>>>> state's
> >>>>>>>>>>>>> valid_length_. Further, #1 is in
> >>>>>>>>>>>>> FullScoreReturn.ngram_length.
> >>>>>>>>>>>>> So
> >>>>>>>>>>>>> if
> >>>>>>>>>>>>> you call KenLM directly these are easy to obtain (and
> >>>>>>>>>>>>> you
> >>>>>>>>>>>>> can
> >>>>>>>>>>>>> decide
> >>>>>>>>>>>>> whether to expose them through the Moses abstraction
> >>>>>>>>>>>>> layer).
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Outside the decoder, you can run
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> kenlm/query model_file null
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> then provide your trigrams on stdin.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Here's an example with kenlm/query kenlm/lm/test.arpa
> >>>>>>>>>>>>> null
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> looking on a
> >>>>>>>>>>>>> looking=23 1 -1.28594 on=25 2 -0.46389 a=5 3 -0.0483513
> >>>>>>>>>>>>> Total: -1.79818 OOV: 0
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> The format is "word=vocab_id ngram_length score". So
> >>>>>>>>>>>>> this
> >>>>>>>>>>>>> is
> >>>>>>>>>>>>> a
> >>>>>>>>>>>>> trigram
> >>>>>>>>>>>>> in the model because "a=5 3" appears.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On 07/13/11 08:50, Marc LEGENDRE wrote:
> >>>>>>>>>>>>>> Hello,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I am trying to use the language models loaded by Moses
> >>>>>>>>>>>>>> ;
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I am using a 3-gram LM, and I need to know whether it
> >>>>>>>>>>>>>> contains
> >>>>>>>>>>>>>> a
> >>>>>>>>>>>>>> given N-gram or not.
> >>>>>>>>>>>>>> I tried to play around with
> >>>>>>>>>>>>>> LanguageModelImplementation::GetValueForgotState(...),
> >>>>>>>>>>>>>> but the boolean 'unknown' in the returned structure
> >>>>>>>>>>>>>> does
> >>>>>>>>>>>>>> not
> >>>>>>>>>>>>>> seem
> >>>>>>>>>>>>>> to
> >>>>>>>>>>>>>> be what I'm looking for.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Is there any simple way of getting this piece of
> >>>>>>>>>>>>>> information
> >>>>>>>>>>>>>> ?
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Regards,
> >>>>>>>>>>>>>> Marc Legendre
> >>>>>>>>>>>>>> _______________________________________________
> >>>>>>>>>>>>>> Moses-support mailing list
> >>>>>>>>>>>>>> [email protected]
> >>>>>>>>>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
> >>>>>>>>>>>> _______________________________________________
> >>>>>>>>>>>> Moses-support mailing list
> >>>>>>>>>>>> [email protected]
> >>>>>>>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>> _______________________________________________
> >>>>>>>>>>> Moses-support mailing list
> >>>>>>>>>>> [email protected]
> >>>>>>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
> >>>>>>>>>> _______________________________________________
> >>>>>>>>>> Moses-support mailing list
> >>>>>>>>>> [email protected]
> >>>>>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
> >>>>>>>>> _______________________________________________
> >>>>>>>>> Moses-support mailing list
> >>>>>>>>> [email protected]
> >>>>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
> >>>>>>>>>
> >>>>>>>> _______________________________________________
> >>>>>>>> Moses-support mailing list
> >>>>>>>> [email protected]
> >>>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
> >>>>>>>>
> >>>>>>>>
> >>>>>>> _______________________________________________
> >>>>>>> Moses-support mailing list
> >>>>>>> [email protected]
> >>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
> >>>>>> _______________________________________________
> >>>>>> Moses-support mailing list
> >>>>>> [email protected]
> >>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> Moses-support mailing list
> >>>>>> [email protected]
> >>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
> >>>>>>
> > _______________________________________________
> > Moses-support mailing list
> > [email protected]
> > http://mailman.mit.edu/mailman/listinfo/moses-support
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Using Moses language models

Reply via email to