Re: [Moses-support] Using Moses language models

Marc LEGENDRE Fri, 22 Jul 2011 01:23:04 -0700

Well, we (me and the people I work with) were hoping not to have to maintain
a modified version of Moses.


Luckily, obviousness just hit me like a truck : if something is specific to a 
LM,
it does not have to be in the top layer.
Having a common interface does not prevent subclasses from having a specific 
behaviour,
we could have a LanguageModelKen method, say GetValueForgotStateKen(...) which 
would return
something specific, say a LMKenResult, which would contain a LMResult plus 
others things
like, say, a ngram_length field :-).
And the virtual GetValueForgotState() method would simply return the LMResult 
from there.

This way, no need to break the high level API,
and no extra maintenance cost for us (me and the peop... Well, you know).

----- Mail original -----
> De: "Hieu Hoang" <[email protected]>
> À: "Kenneth Heafield" <[email protected]>
> Cc: [email protected]
> Envoyé: Vendredi 22 Juillet 2011 04:50:14
> Objet: Re: [Moses-support] Using Moses language models
> 
> 
> true, & there's no right answer to it.
> 
> I suppose 1 goal of the trunk is to make sure that the core
> functionality of translating isn't affected too much, in terms of
> quality, speed, or memory. ANother goal is to make not to overburden
> the API with things no-one else uses or implement.
> 
> therefore, i think a good strategy is to branch & do what you like
> 
> 
> On 21 July 2011 22:46, Kenneth Heafield < [email protected] >
> wrote:
> 
> 
> Marc makes a good point. When one language model provides more
> information than do other language models, it's difficult to maintain
> a
> common abstraction layer. Currently we're looking at n-gram length.
> SRILM doesn't provide access to that (but you can get right-looking
> state length which is usually the same thing).
> 
> I'm working on making this issue more severe with left-looking state
> optimization and explicit hypothesis bounds. How do we change the
> decoder to use these features if not all of the language models
> support
> them?
> 
> Maybe another class in the language model hierarchy supporting these
> additional features. But it's going to make the decoder look ugly if
> you want to support both.
> 
> 
> 
> 
> On 07/21/11 11:14, Hieu Hoang wrote:
> > hi marc,
> > 
> > it'll be good for people to see your changes.
> > 
> > i suppose you should create a branch and make your changes in
> > there.
> > 
> > If there are other people interested, you can point them to your
> > branch.
> > If more people are interested and it doesn't affect other people
> > too
> > much, then we can move it to trunk.
> > 
> > i'll email you offline with svn details
> > 
> > On 21/07/2011 15:16, Marc LEGENDRE wrote:
> >> Alright, I gave this a try, and it did it for me.
> >> With kenlm, it is a ridiculously straightforward modification,
> >> but now I'm not sure how I can submit it :
> >> on one hand, I am not a "machine tranlation guy" and I don't
> >> imagine myself
> >> digging in every other LM to find how to set the ngram_length
> >> value;
> >> and on the other hand I would feel guilty to submit a 10-line
> >> patch and say
> >> "Guys, I need this, would you mind committing it and doing
> >> yourselves the
> >> necessary modifications in every other wrapper ?"
> >> 
> >> How do you, Moses developers, feel about this ?
> >> Is it acceptable / outrageously stupid if I set the value to -1 in
> >> the other wrappers,
> >> maybe with a TODO, and properly document it in the super class ?
> >> 
> >> ----- Mail original -----
> >>> De: "Kenneth Heafield"< [email protected] >
> >>> À: [email protected]
> >>> Envoyé: Mercredi 13 Juillet 2011 20:53:46
> >>> Objet: Re: [Moses-support] Using Moses language models
> >>> 
> >>> I'd suggest adding a ngram_length member to LMResult then
> >>> modifying
> >>> each
> >>> model's wrapper (or just mine) to set that value.
> >>> 
> >>> You're welcome to move stuff from LanguageModelKen.cpp to
> >>> LanguageModelKen.h as necessary. I chose this setup to minimize
> >>> unnecessary includes.
> >>> 
> >>> Kenneth
> >>> 
> >>> On 07/13/11 14:33, Marc LEGENDRE wrote:
> >>>> Well, not only the header is not "public", so to speak, (which I
> >>>> agree is not a major obstacle)
> >>>> but also the desired pointer is a private member of the class,
> >>>> and
> >>>> sadly lacks a getter.
> >>>> As far as I know, it means that accessing it will involve
> >>>> questionnable C++ tricks.
> >>>> (never tried, though)
> >>>> 
> >>>> If modifying Moses is not too much of a chore, I'll give it a
> >>>> thought.
> >>>> 
> >>>> Anyway, thank you for your answers.
> >>>> 
> >>>> ----- Mail original -----
> >>>>> De: "Hieu Hoang"< [email protected] >
> >>>>> À: [email protected]
> >>>>> Envoyé: Mercredi 13 Juillet 2011 18:40:11
> >>>>> Objet: Re: [Moses-support] Using Moses language models
> >>>>> i guess lm::Model is specific to the ken lm implementation. If
> >>>>> you
> >>>>> want
> >>>>> use it you should include the header yourself and cast whatever
> >>>>> you
> >>>>> need
> >>>>> to get the pointer.
> >>>>> 
> >>>>> if you're feeling generous, maybe you can extend the moses LM
> >>>>> wrapper
> >>>>> so
> >>>>> that all LM implementations have the opportunity to return the
> >>>>> length
> >>>>> n-gram match.
> >>>>> 
> >>>>> On 13/07/2011 21:51, Marc LEGENDRE wrote:
> >>>>>> The length of the n-gram match is sufficient for I want,
> >>>>>> indeed.
> >>>>>> I figured out how to do get it using directly kenlm, but as I
> >>>>>> am
> >>>>>> running the decoder, I wanted to use the already loaded LM.
> >>>>>> 
> >>>>>> I first tried to dig my way through the Moses abstraction
> >>>>>> layers
> >>>>>> to
> >>>>>> retrieve a pointer to a lm::Model from kenlm, but the
> >>>>>> Moses::LanguageModelKen header is not part of the public
> >>>>>> headers
> >>>>>> of
> >>>>>> Moses ; that's why I tried to use only Moses interface.
> >>>>>> 
> >>>>>> (I did I did not mention this alternative ; If someone knows
> >>>>>> how
> >>>>>> to
> >>>>>> get such a pointer, I can carry on from there)
> >>>>>> 
> >>>>>> 
> >>>>>> ----- Mail original -----
> >>>>>>> De: "Kenneth Heafield"< [email protected] >
> >>>>>>> À: "Marc LEGENDRE"< [email protected] >
> >>>>>>> Envoyé: Mercredi 13 Juillet 2011 16:12:27
> >>>>>>> Objet: Re: [Moses-support] Using Moses language models
> >>>>>>> The definition of unknown is that the word you asked for (the
> >>>>>>> rightmost
> >>>>>>> one) is mapped to<unk> i.e. an OOV.
> >>>>>>> 
> >>>>>>> Are you looking for:
> >>>>>>> 
> >>>>>>> 1) Length of n-gram matched in the model
> >>>>>>> 
> >>>>>>> or
> >>>>>>> 
> >>>>>>> 2) Length of state you must keep for valid continuation to
> >>>>>>> the
> >>>>>>> right
> >>>>>>> 
> >>>>>>> These are slightly different things due to state
> >>>>>>> minimization.
> >>>>>>> The
> >>>>>>> moses abstraction layer does not return either in a general
> >>>>>>> way.
> >>>>>>> However, if you're using KenLM, #2 is in the returned state's
> >>>>>>> valid_length_. Further, #1 is in
> >>>>>>> FullScoreReturn.ngram_length.
> >>>>>>> So
> >>>>>>> if
> >>>>>>> you call KenLM directly these are easy to obtain (and you can
> >>>>>>> decide
> >>>>>>> whether to expose them through the Moses abstraction layer).
> >>>>>>> 
> >>>>>>> Outside the decoder, you can run
> >>>>>>> 
> >>>>>>> kenlm/query model_file null
> >>>>>>> 
> >>>>>>> then provide your trigrams on stdin.
> >>>>>>> 
> >>>>>>> Here's an example with kenlm/query kenlm/lm/test.arpa null
> >>>>>>> 
> >>>>>>> looking on a
> >>>>>>> looking=23 1 -1.28594 on=25 2 -0.46389 a=5 3 -0.0483513
> >>>>>>> Total: -1.79818 OOV: 0
> >>>>>>> 
> >>>>>>> The format is "word=vocab_id ngram_length score". So this is
> >>>>>>> a
> >>>>>>> trigram
> >>>>>>> in the model because "a=5 3" appears.
> >>>>>>> 
> >>>>>>> On 07/13/11 08:50, Marc LEGENDRE wrote:
> >>>>>>>> Hello,
> >>>>>>>> 
> >>>>>>>> I am trying to use the language models loaded by Moses ;
> >>>>>>>> 
> >>>>>>>> I am using a 3-gram LM, and I need to know whether it
> >>>>>>>> contains
> >>>>>>>> a
> >>>>>>>> given N-gram or not.
> >>>>>>>> I tried to play around with
> >>>>>>>> LanguageModelImplementation::GetValueForgotState(...),
> >>>>>>>> but the boolean 'unknown' in the returned structure does not
> >>>>>>>> seem
> >>>>>>>> to
> >>>>>>>> be what I'm looking for.
> >>>>>>>> 
> >>>>>>>> Is there any simple way of getting this piece of information
> >>>>>>>> ?
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>> Regards,
> >>>>>>>> Marc Legendre
> >>>>>>>> _______________________________________________
> >>>>>>>> Moses-support mailing list
> >>>>>>>> [email protected]
> >>>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
> >>>>>> _______________________________________________
> >>>>>> Moses-support mailing list
> >>>>>> [email protected]
> >>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
> >>>>>> 
> >>>>>> 
> >>>>> _______________________________________________
> >>>>> Moses-support mailing list
> >>>>> [email protected]
> >>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
> >>>> _______________________________________________
> >>>> Moses-support mailing list
> >>>> [email protected]
> >>>> http://mailman.mit.edu/mailman/listinfo/moses-support
> >>> _______________________________________________
> >>> Moses-support mailing list
> >>> [email protected]
> >>> http://mailman.mit.edu/mailman/listinfo/moses-support
> >>> 
> >> _______________________________________________
> >> Moses-support mailing list
> >> [email protected]
> >> http://mailman.mit.edu/mailman/listinfo/moses-support
> >> 
> >> 
> > _______________________________________________
> > Moses-support mailing list
> > [email protected]
> > http://mailman.mit.edu/mailman/listinfo/moses-support
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
> 
> 
> 
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
> 

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Using Moses language models

Reply via email to