Thank you everyone for the help and explanations! I appreciate that very much! I got rid of my error by removing long sentences from my training data. I missed this step during preparation of this particular set.
On Sat, Jul 23, 2011 at 4:19 AM, Tom Hoar <[email protected]> wrote: > This error was reported to the GIZA++ team as a Y2K bug with a fix similar > to option 2 below that was tested on earlier and later versions of gcc. Not > sure why the fix wasn't rolled into the GIZA++ trunk. MGIZA++ needs the same > fix. I attached the diff that we apply to DoMY for both GIZA++ and MGIZA++ > > Tom > > > > On Sat, 23 Jul 2011 02:53:53 +0700, Thu Vuong Hoai <[email protected]> > wrote: > > it's issues 11 in > code.google.com/giza-pp http://code.google.com/p/giza-pp/issues/detail?id=11, > > I know 2 solutions for this issue: > 1. try to use compiler with c99 such as gcc version 4.1 (you did it) > 2. edit source code like one comment > in http://code.google.com/p/giza-pp/issues/detail?id=11, and use can use gcc > version 4..4 and newer > best regard. > > On Sat, Jul 23, 2011 at 1:17 AM, <[email protected]> wrote: >> >> Send Moses-support mailing list submissions to >> [email protected] >> >> To subscribe or unsubscribe via the World Wide Web, visit >> http://mailman.mit.edu/mailman/listinfo/moses-support >> or, via email, send a message with subject or body 'help' to >> [email protected] >> >> You can reach the person managing the list at >> [email protected] >> >> When replying, please edit your Subject line so it is more specific >> than "Re: Contents of Moses-support digest..." >> >> >> Today's Topics: >> >> 1. Re: GIZA++: glibc detected (Angelina Ivanova) (Joerg Tiedemann) >> 2. Re: Using Moses language models (Kenneth Heafield) >> >> >> ---------------------------------------------------------------------- >> >> Message: 1 >> Date: Fri, 22 Jul 2011 19:46:01 +0200 >> From: Joerg Tiedemann <[email protected]> >> Subject: Re: [Moses-support] GIZA++: glibc detected (Angelina Ivanova) >> To: Angelina Ivanova <[email protected]> >> Cc: [email protected] >> Message-ID: >> [email protected]> >> Content-Type: text/plain; charset=ISO-8859-1 >> >> I had a similar problem with g++ 4.4 (Giza++ crashed on some smaller >> data sets). I found this >> http://permalink.gmane.org/gmane.comp.nlp.moses.user/4079 >> and reverting to 4.1 removed the problem. >> >> There is also a comment >> http://comments.gmane.org/gmane.comp.nlp.moses.user/4079 >> with a different solution. >> >> I hope this helps, >> J?rg >> >> >> On Fri, Jul 22, 2011 at 7:09 PM, Angelina Ivanova <[email protected]> >> wrote: >> > Hello! >> > Thank you for the fast reply. Yes, I saw some links on GIZA++, but I >> > didn't find a solution or the hint what can cause this error. >> > >> > My environment is: >> > #62 UBUNTU 2.6.32-32-generic-pae >> > Moses Built on Jan 28 2009 >> > gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5) >> > giza-pp-v1.0.2 >> > >> > However, I can run Moses successfully on the other data. >> > >> > >> > >> > On Fri, Jul 22, 2011 at 6:34 PM, Thu Vuong Hoai <[email protected]> >> > wrote: >> >> Hello, >> >> I found your error in the issues page of Giza++, could you please check >> >> this >> >> link?http://code.google.com/p/giza-pp/issues/detail?id=15, I've thought >> >> it's >> >> not enough good for you but I want to ask about issue 11, do you fix >> >> it? and >> >> could you plz, provide more information about your environment? >> >> On Fri, Jul 22, 2011 at 11:04 PM, <[email protected]> >> >> wrote: >> >>> >> >>> Send Moses-support mailing list submissions to >> >>> ? ? ? [email protected] >> >>> >> >>> To subscribe or unsubscribe via the World Wide Web, visit >> >>> ? ? ? ?http://mailman.mit.edu/mailman/listinfo/moses-support >> >>> or, via email, send a message with subject or body 'help' to >> >>> ? ? ? [email protected] >> >>> >> >>> You can reach the person managing the list at >> >>> ? ? ? [email protected] >> >>> >> >>> When replying, please edit your Subject line so it is more specific >> >>> than "Re: Contents of Moses-support digest..." >> >>> >> >>> >> >>> Today's Topics: >> >>> >> >>> ? 1. Re: Using Moses language models (Barry Haddow) >> >>> ? 2. Re: Using Moses language models (Marc LEGENDRE) >> >>> ? 3. GIZA++: glibc detected (Angelina Ivanova) >> >>> >> >>> >> >>> ---------------------------------------------------------------------- >> >>> >> >>> Message: 1 >> >>> Date: Fri, 22 Jul 2011 09:14:47 +0100 >> >>> From: Barry Haddow <[email protected]> >> >>> Subject: Re: [Moses-support] Using Moses language models >> >>> To: [email protected], [email protected] >> >>> Message-ID: <[email protected]> >> >>> Content-Type: text/plain; ?charset="utf-8" >> >>> >> >>> On Friday 22 July 2011 03:50, Hieu Hoang wrote: >> >>> > true, & there's no right answer to it. >> >>> > >> >>> > I suppose 1 goal of the trunk is to make sure that the core >> >>> > functionality >> >>> > of translating isn't affected too much, in terms of quality, speed, >> >>> > or >> >>> > memory. ANother goal is to make not to overburden the API with >> >>> > things >> >>> > no-one else uses or implement. >> >>> > >> >>> > therefore, i think a good strategy is to branch & do what you like >> >>> > >> >>> >> >>> Hi Hieu >> >>> >> >>> I'm not sure I see the point of implementing this in a branch and >> >>> never >> >>> merging. That's not a branch, it's a fork. The point of doing a small >> >>> change >> >>> like this in a branch would be so that the LM interface experts (ie >> >>> you >> >>> and >> >>> Ken and ...) could have a look at it before it gets merged in. >> >>> >> >>> As regards how to implement the interface changes, what would be the >> >>> consequences of having other LM implementations throw an exception or >> >>> an >> >>> assert for ngram_length? I think returning -1 is a very bad idea, >> >>> especially >> >>> as the return value is probably a size_t, and returning 0 could also >> >>> lead >> >>> to >> >>> subtle and confusing behaviour. However if there is a return value >> >>> with >> >>> the >> >>> semantics of "don't know" then that would be the ideal solution. >> >>> >> >>> cheers - Barry >> >>> >> >>> -- >> >>> The University of Edinburgh is a charitable body, registered in >> >>> Scotland, with registration number SC005336. >> >>> >> >>> >> >>> >> >>> ------------------------------ >> >>> >> >>> Message: 2 >> >>> Date: Fri, 22 Jul 2011 10:21:44 +0200 (CEST) >> >>> From: Marc LEGENDRE <[email protected]> >> >>> Subject: Re: [Moses-support] Using Moses language models >> >>> To: [email protected] >> >>> Cc: [email protected] >> >>> Message-ID: >> >>> ? ? ? >> >>> ?<[email protected]> >> >>> Content-Type: text/plain; charset=ISO-8859-15 >> >>> >> >>> Well, we (me and the people I work with) were hoping not to have to >> >>> maintain >> >>> a modified version of Moses. >> >>> >> >>> Luckily, obviousness just hit me like a truck : if something is >> >>> specific >> >>> to a LM, >> >>> it does not have to be in the top layer. >> >>> Having a common interface does not prevent subclasses from having a >> >>> specific behaviour, >> >>> we could have a LanguageModelKen method, say >> >>> GetValueForgotStateKen(...) >> >>> which would return >> >>> something specific, say a LMKenResult, which would contain a LMResult >> >>> plus >> >>> others things >> >>> like, say, a ngram_length field :-). >> >>> And the virtual GetValueForgotState() method would simply return the >> >>> LMResult from there. >> >>> >> >>> This way, no need to break the high level API, >> >>> and no extra maintenance cost for us (me and the peop... Well, you >> >>> know). >> >>> >> >>> ----- Mail original ----- >> >>> > De: "Hieu Hoang" <[email protected]> >> >>> > ?: "Kenneth Heafield" <[email protected]> >> >>> > Cc: [email protected] >> >>> > Envoy?: Vendredi 22 Juillet 2011 04:50:14 >> >>> > Objet: Re: [Moses-support] Using Moses language models >> >>> > >> >>> > >> >>> > true, & there's no right answer to it. >> >>> > >> >>> > I suppose 1 goal of the trunk is to make sure that the core >> >>> > functionality of translating isn't affected too much, in terms of >> >>> > quality, speed, or memory. ANother goal is to make not to overburden >> >>> > the API with things no-one else uses or implement. >> >>> > >> >>> > therefore, i think a good strategy is to branch & do what you like >> >>> > >> >>> > >> >>> > On 21 July 2011 22:46, Kenneth Heafield < [email protected] > >> >>> > wrote: >> >>> > >> >>> > >> >>> > Marc makes a good point. When one language model provides more >> >>> > information than do other language models, it's difficult to >> >>> > maintain >> >>> > a >> >>> > common abstraction layer. Currently we're looking at n-gram length. >> >>> > SRILM doesn't provide access to that (but you can get right-looking >> >>> > state length which is usually the same thing). >> >>> > >> >>> > I'm working on making this issue more severe with left-looking state >> >>> > optimization and explicit hypothesis bounds. How do we change the >> >>> > decoder to use these features if not all of the language models >> >>> > support >> >>> > them? >> >>> > >> >>> > Maybe another class in the language model hierarchy supporting these >> >>> > additional features. But it's going to make the decoder look ugly if >> >>> > you want to support both. >> >>> > >> >>> > >> >>> > >> >>> > >> >>> > On 07/21/11 11:14, Hieu Hoang wrote: >> >>> > > hi marc, >> >>> > > >> >>> > > it'll be good for people to see your changes. >> >>> > > >> >>> > > i suppose you should create a branch and make your changes in >> >>> > > there. >> >>> > > >> >>> > > If there are other people interested, you can point them to your >> >>> > > branch. >> >>> > > If more people are interested and it doesn't affect other people >> >>> > > too >> >>> > > much, then we can move it to trunk. >> >>> > > >> >>> > > i'll email you offline with svn details >> >>> > > >> >>> > > On 21/07/2011 15:16, Marc LEGENDRE wrote: >> >>> > >> Alright, I gave this a try, and it did it for me. >> >>> > >> With kenlm, it is a ridiculously straightforward modification, >> >>> > >> but now I'm not sure how I can submit it : >> >>> > >> on one hand, I am not a "machine tranlation guy" and I don't >> >>> > >> imagine myself >> >>> > >> digging in every other LM to find how to set the ngram_length >> >>> > >> value; >> >>> > >> and on the other hand I would feel guilty to submit a 10-line >> >>> > >> patch and say >> >>> > >> "Guys, I need this, would you mind committing it and doing >> >>> > >> yourselves the >> >>> > >> necessary modifications in every other wrapper ?" >> >>> > >> >> >>> > >> How do you, Moses developers, feel about this ? >> >>> > >> Is it acceptable / outrageously stupid if I set the value to -1 >> >>> > >> in >> >>> > >> the other wrappers, >> >>> > >> maybe with a TODO, and properly document it in the super class ? >> >>> > >> >> >>> > >> ----- Mail original ----- >> >>> > >>> De: "Kenneth Heafield"< [email protected] > >> >>> > >>> ?: [email protected] >> >>> > >>> Envoy?: Mercredi 13 Juillet 2011 20:53:46 >> >>> > >>> Objet: Re: [Moses-support] Using Moses language models >> >>> > >>> >> >>> > >>> I'd suggest adding a ngram_length member to LMResult then >> >>> > >>> modifying >> >>> > >>> each >> >>> > >>> model's wrapper (or just mine) to set that value. >> >>> > >>> >> >>> > >>> You're welcome to move stuff from LanguageModelKen.cpp to >> >>> > >>> LanguageModelKen.h as necessary. I chose this setup to minimize >> >>> > >>> unnecessary includes. >> >>> > >>> >> >>> > >>> Kenneth >> >>> > >>> >> >>> > >>> On 07/13/11 14:33, Marc LEGENDRE wrote: >> >>> > >>>> Well, not only the header is not "public", so to speak, (which >> >>> > >>>> I >> >>> > >>>> agree is not a major obstacle) >> >>> > >>>> but also the desired pointer is a private member of the class, >> >>> > >>>> and >> >>> > >>>> sadly lacks a getter. >> >>> > >>>> As far as I know, it means that accessing it will involve >> >>> > >>>> questionnable C++ tricks. >> >>> > >>>> (never tried, though) >> >>> > >>>> >> >>> > >>>> If modifying Moses is not too much of a chore, I'll give it a >> >>> > >>>> thought. >> >>> > >>>> >> >>> > >>>> Anyway, thank you for your answers. >> >>> > >>>> >> >>> > >>>> ----- Mail original ----- >> >>> > >>>>> De: "Hieu Hoang"< [email protected] > >> >>> > >>>>> ?: [email protected] >> >>> > >>>>> Envoy?: Mercredi 13 Juillet 2011 18:40:11 >> >>> > >>>>> Objet: Re: [Moses-support] Using Moses language models >> >>> > >>>>> i guess lm::Model is specific to the ken lm implementation. If >> >>> > >>>>> you >> >>> > >>>>> want >> >>> > >>>>> use it you should include the header yourself and cast >> >>> > >>>>> whatever >> >>> > >>>>> you >> >>> > >>>>> need >> >>> > >>>>> to get the pointer. >> >>> > >>>>> >> >>> > >>>>> if you're feeling generous, maybe you can extend the moses LM >> >>> > >>>>> wrapper >> >>> > >>>>> so >> >>> > >>>>> that all LM implementations have the opportunity to return the >> >>> > >>>>> length >> >>> > >>>>> n-gram match. >> >>> > >>>>> >> >>> > >>>>> On 13/07/2011 21:51, Marc LEGENDRE wrote: >> >>> > >>>>>> The length of the n-gram match is sufficient for I want, >> >>> > >>>>>> indeed. >> >>> > >>>>>> I figured out how to do get it using directly kenlm, but as I >> >>> > >>>>>> am >> >>> > >>>>>> running the decoder, I wanted to use the already loaded LM. >> >>> > >>>>>> >> >>> > >>>>>> I first tried to dig my way through the Moses abstraction >> >>> > >>>>>> layers >> >>> > >>>>>> to >> >>> > >>>>>> retrieve a pointer to a lm::Model from kenlm, but the >> >>> > >>>>>> Moses::LanguageModelKen header is not part of the public >> >>> > >>>>>> headers >> >>> > >>>>>> of >> >>> > >>>>>> Moses ; that's why I tried to use only Moses interface. >> >>> > >>>>>> >> >>> > >>>>>> (I did I did not mention this alternative ; If someone knows >> >>> > >>>>>> how >> >>> > >>>>>> to >> >>> > >>>>>> get such a pointer, I can carry on from there) >> >>> > >>>>>> >> >>> > >>>>>> >> >>> > >>>>>> ----- Mail original ----- >> >>> > >>>>>>> De: "Kenneth Heafield"< [email protected] > >> >>> > >>>>>>> ?: "Marc LEGENDRE"< [email protected] > >> >>> > >>>>>>> Envoy?: Mercredi 13 Juillet 2011 16:12:27 >> >>> > >>>>>>> Objet: Re: [Moses-support] Using Moses language models >> >>> > >>>>>>> The definition of unknown is that the word you asked for >> >>> > >>>>>>> (the >> >>> > >>>>>>> rightmost >> >>> > >>>>>>> one) is mapped to i.e. an OOV. >> >>> > >>>>>>> >> >>> > >>>>>>> Are you looking for: >> >>> > >>>>>>> >> >>> > >>>>>>> 1) Length of n-gram matched in the model >> >>> > >>>>>>> >> >>> > >>>>>>> or >> >>> > >>>>>>> >> >>> > >>>>>>> 2) Length of state you must keep for valid continuation to >> >>> > >>>>>>> the >> >>> > >>>>>>> right >> >>> > >>>>>>> >> >>> > >>>>>>> These are slightly different things due to state >> >>> > >>>>>>> minimization. >> >>> > >>>>>>> The >> >>> > >>>>>>> moses abstraction layer does not return either in a general >> >>> > >>>>>>> way. >> >>> > >>>>>>> However, if you're using KenLM, #2 is in the returned >> >>> > >>>>>>> state's >> >>> > >>>>>>> valid_length_. Further, #1 is in >> >>> > >>>>>>> FullScoreReturn.ngram_length. >> >>> > >>>>>>> So >> >>> > >>>>>>> if >> >>> > >>>>>>> you call KenLM directly these are easy to obtain (and you >> >>> > >>>>>>> can >> >>> > >>>>>>> decide >> >>> > >>>>>>> whether to expose them through the Moses abstraction layer). >> >>> > >>>>>>> >> >>> > >>>>>>> Outside the decoder, you can run >> >>> > >>>>>>> >> >>> > >>>>>>> kenlm/query model_file null >> >>> > >>>>>>> >> >>> > >>>>>>> then provide your trigrams on stdin. >> >>> > >>>>>>> >> >>> > >>>>>>> Here's an example with kenlm/query kenlm/lm/test.arpa null >> >>> > >>>>>>> >> >>> > >>>>>>> looking on a >> >>> > >>>>>>> looking=23 1 -1.28594 on=25 2 -0.46389 a=5 3 -0.0483513 >> >>> > >>>>>>> Total: -1.79818 OOV: 0 >> >>> > >>>>>>> >> >>> > >>>>>>> The format is "word=vocab_id ngram_length score". So this is >> >>> > >>>>>>> a >> >>> > >>>>>>> trigram >> >>> > >>>>>>> in the model because "a=5 3" appears. >> >>> > >>>>>>> >> >>> > >>>>>>> On 07/13/11 08:50, Marc LEGENDRE wrote: >> >>> > >>>>>>>> Hello, >> >>> > >>>>>>>> >> >>> > >>>>>>>> I am trying to use the language models loaded by Moses ; >> >>> > >>>>>>>> >> >>> > >>>>>>>> I am using a 3-gram LM, and I need to know whether it >> >>> > >>>>>>>> contains >> >>> > >>>>>>>> a >> >>> > >>>>>>>> given N-gram or not. >> >>> > >>>>>>>> I tried to play around with >> >>> > >>>>>>>> LanguageModelImplementation::GetValueForgotState(...), >> >>> > >>>>>>>> but the boolean 'unknown' in the returned structure does >> >>> > >>>>>>>> not >> >>> > >>>>>>>> seem >> >>> > >>>>>>>> to >> >>> > >>>>>>>> be what I'm looking for. >> >>> > >>>>>>>> >> >>> > >>>>>>>> Is there any simple way of getting this piece of >> >>> > >>>>>>>> information >> >>> > >>>>>>>> ? >> >>> > >>>>>>>> >> >>> > >>>>>>>> >> >>> > >>>>>>>> Regards, >> >>> > >>>>>>>> Marc Legendre >> >>> > >>>>>>>> _______________________________________________ >> >>> > >>>>>>>> Moses-support mailing list >> >>> > >>>>>>>> [email protected] >> >>> > >>>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support >> >>> > >>>>>> _______________________________________________ >> >>> > >>>>>> Moses-support mailing list >> >>> > >>>>>> [email protected] >> >>> > >>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support >> >>> > >>>>>> >> >>> > >>>>>> >> >>> > >>>>> _______________________________________________ >> >>> > >>>>> Moses-support mailing list >> >>> > >>>>> [email protected] >> >>> > >>>>> http://mailman.mit.edu/mailman/listinfo/moses-support >> >>> > >>>> _______________________________________________ >> >>> > >>>> Moses-support mailing list >> >>> > >>>> [email protected] >> >>> > >>>> http://mailman.mit.edu/mailman/listinfo/moses-support >> >>> > >>> _______________________________________________ >> >>> > >>> Moses-support mailing list >> >>> > >>> [email protected] >> >>> > >>> http://mailman.mit.edu/mailman/listinfo/moses-support >> >>> > >>> >> >>> > >> _______________________________________________ >> >>> > >> Moses-support mailing list >> >>> > >> [email protected] >> >>> > >> http://mailman.mit.edu/mailman/listinfo/moses-support >> >>> > >> >> >>> > >> >> >>> > > _______________________________________________ >> >>> > > Moses-support mailing list >> >>> > > [email protected] >> >>> > > http://mailman.mit.edu/mailman/listinfo/moses-support >> >>> > _______________________________________________ >> >>> > Moses-support mailing list >> >>> > [email protected] >> >>> > http://mailman.mit.edu/mailman/listinfo/moses-support >> >>> > >> >>> > >> >>> > >> >>> > _______________________________________________ >> >>> > Moses-support mailing list >> >>> > [email protected] >> >>> > http://mailman.mit.edu/mailman/listinfo/moses-support >> >>> > >> >>> >> >>> >> >>> >> >>> ------------------------------ >> >>> >> >>> Message: 3 >> >>> Date: Fri, 22 Jul 2011 16:38:53 +0200 >> >>> From: Angelina Ivanova <[email protected]> >> >>> Subject: [Moses-support] GIZA++: glibc detected >> >>> To: [email protected] >> >>> Message-ID: >> >>> >> >>> ?<cahklk21bie0unchhrtdvqe69ep0i5k83+jvrnm7woiohocx...@mail.gmail.com> >> >>> Content-Type: text/plain; charset=ISO-8859-1 >> >>> >> >>> Hello, >> >>> I got the error below when I tried to align Russian to English. I >> >>> searched the error in the Internet and found out that the cause of the >> >>> problem could be in having a null sentence in the corpus. However, I >> >>> didn't detect any null sentences in my corpus. The encoding is UTF8 >> >>> and all previous experiments with the corpus that contained the one >> >>> from this as a subset, went smoothly. Could you please help me? >> >>> >> >>> *** glibc detected ***/moses/tools/bin/GIZA++: double free or >> >>> corruption (out): 0x14901578 *** >> >>> ======= Backtrace: ========= >> >>> [0x8166e81] >> >>> [0x8168946] >> >>> [0x813ebb1] >> >>> [0x80e6fe9] >> >>> [0x80d8420] >> >>> [0x80da791] >> >>> [0x806f55a] >> >>> [0x80742e8] >> >>> [0x814d9bb] >> >>> [0x8048151] >> >>> ======= Memory map: ======== >> >>> 00d4e000-00d4f000 r-xp 00000000 00:00 0 ? ? ? ? ?[vdso] >> >>> 08048000-081f6000 r-xp 00000000 00:1e 1612751353 >> >>> ?/moses/tools/bin/GIZA++ >> >>> 081f6000-081f8000 rw-p 001ae000 00:1e 1612751353 >> >>> ?/moses/tools/bin/GIZA++ >> >>> 081f8000-081ff000 rw-p 00000000 00:00 0 >> >>> 082ce000-1580d000 rw-p 00000000 00:00 0 ? ? ? ? ?[heap] >> >>> b5f00000-b5f23000 rw-p 00000000 00:00 0 >> >>> b5f23000-b6000000 ---p 00000000 00:00 0 >> >>> b6093000-b6106000 rw-p 00000000 00:00 0 >> >>> b6179000-b7099000 rw-p 00000000 00:00 0 >> >>> b70dd000-b7525000 rw-p 00000000 00:00 0 >> >>> b7561000-b76a7000 rw-p 00000000 00:00 0 >> >>> b76c0000-b7779000 rw-p 00000000 00:00 0 >> >>> bfb6a000-bfb7f000 rw-p 00000000 00:00 0 ? ? ? ? ?[stack] >> >>> Exit code: 1 >> >>> >> >>> >> >>> ------------------------------ >> >>> >> >>> _______________________________________________ >> >>> Moses-support mailing list >> >>> [email protected] >> >>> http://mailman.mit.edu/mailman/listinfo/moses-support >> >>> >> >>> >> >>> End of Moses-support Digest, Vol 57, Issue 40 >> >>> ********************************************* >> >> >> >> >> >> >> >> -- >> >> Thu. >> >> >> >> _______________________________________________ >> >> Moses-support mailing list >> >> [email protected] >> >> http://mailman.mit.edu/mailman/listinfo/moses-support >> >> >> >> >> > >> > _______________________________________________ >> > Moses-support mailing list >> > [email protected] >> > http://mailman.mit.edu/mailman/listinfo/moses-support >> > >> >> >> >> -- >> >> ********************************************************************************** >> ?J?rg >> Tiedemann?????????????????????????????????????jorg.tiedem...@lingfil.uu.se >> ?Dep. of Linguistics and Philology >> http://stp.lingfil.uu.se/~joerg/ >> ?Uppsala University????????????????????????????????? tel:? +46 (0)18 - 471 >> 1412 >> ?Box 635, SE-751 26 Uppsala/SWEDEN?? fax: +46 (0)18 - 471 1094 >> >> >> >> ------------------------------ >> >> Message: 2 >> Date: Fri, 22 Jul 2011 14:18:21 -0400 >> From: Kenneth Heafield <[email protected]> >> Subject: Re: [Moses-support] Using Moses language models >> To: Marc LEGENDRE <[email protected]> >> Cc: [email protected], [email protected] >> Message-ID: <[email protected]> >> Content-Type: text/plain; charset=ISO-8859-15 >> >> Hi Marc, >> >> This sounds like a simple change, so a branch is probably too much >> overhead. Please do one of the following: >> >> 1. Send a patch as generated by diff -rupN $old $new . Do a make clean >> first. >> 2. Attach the files you modified and send them, along with the revision >> you based changes on. >> 3. Make a branch (if you already did). >> >> Thanks, >> >> Kenneth >> >> On 07/22/11 04:21, Marc LEGENDRE wrote: >> > Well, we (me and the people I work with) were hoping not to have to >> > maintain >> > a modified version of Moses. >> > >> > Luckily, obviousness just hit me like a truck : if something is specific >> > to a LM, >> > it does not have to be in the top layer. >> > Having a common interface does not prevent subclasses from having a >> > specific behaviour, >> > we could have a LanguageModelKen method, say GetValueForgotStateKen(...) >> > which would return >> > something specific, say a LMKenResult, which would contain a LMResult >> > plus others things >> > like, say, a ngram_length field :-). >> > And the virtual GetValueForgotState() method would simply return the >> > LMResult from there. >> > >> > This way, no need to break the high level API, >> > and no extra maintenance cost for us (me and the peop... Well, you >> > know). >> > >> > ----- Mail original ----- >> >> De: "Hieu Hoang" <[email protected]> >> >> ?: "Kenneth Heafield" <[email protected]> >> >> Cc: [email protected] >> >> Envoy?: Vendredi 22 Juillet 2011 04:50:14 >> >> Objet: Re: [Moses-support] Using Moses language models >> >> >> >> >> >> true, & there's no right answer to it. >> >> >> >> I suppose 1 goal of the trunk is to make sure that the core >> >> functionality of translating isn't affected too much, in terms of >> >> quality, speed, or memory. ANother goal is to make not to overburden >> >> the API with things no-one else uses or implement. >> >> >> >> therefore, i think a good strategy is to branch & do what you like >> >> >> >> >> >> On 21 July 2011 22:46, Kenneth Heafield < [email protected] > >> >> wrote: >> >> >> >> >> >> Marc makes a good point. When one language model provides more >> >> information than do other language models, it's difficult to maintain >> >> a >> >> common abstraction layer. Currently we're looking at n-gram length. >> >> SRILM doesn't provide access to that (but you can get right-looking >> >> state length which is usually the same thing). >> >> >> >> I'm working on making this issue more severe with left-looking state >> >> optimization and explicit hypothesis bounds. How do we change the >> >> decoder to use these features if not all of the language models >> >> support >> >> them? >> >> >> >> Maybe another class in the language model hierarchy supporting these >> >> additional features. But it's going to make the decoder look ugly if >> >> you want to support both. >> >> >> >> >> >> >> >> >> >> On 07/21/11 11:14, Hieu Hoang wrote: >> >>> hi marc, >> >>> >> >>> it'll be good for people to see your changes. >> >>> >> >>> i suppose you should create a branch and make your changes in >> >>> there. >> >>> >> >>> If there are other people interested, you can point them to your >> >>> branch. >> >>> If more people are interested and it doesn't affect other people >> >>> too >> >>> much, then we can move it to trunk. >> >>> >> >>> i'll email you offline with svn details >> >>> >> >>> On 21/07/2011 15:16, Marc LEGENDRE wrote: >> >>>> Alright, I gave this a try, and it did it for me. >> >>>> With kenlm, it is a ridiculously straightforward modification, >> >>>> but now I'm not sure how I can submit it : >> >>>> on one hand, I am not a "machine tranlation guy" and I don't >> >>>> imagine myself >> >>>> digging in every other LM to find how to set the ngram_length >> >>>> value; >> >>>> and on the other hand I would feel guilty to submit a 10-line >> >>>> patch and say >> >>>> "Guys, I need this, would you mind committing it and doing >> >>>> yourselves the >> >>>> necessary modifications in every other wrapper ?" >> >>>> >> >>>> How do you, Moses developers, feel about this ? >> >>>> Is it acceptable / outrageously stupid if I set the value to -1 in >> >>>> the other wrappers, >> >>>> maybe with a TODO, and properly document it in the super class ? >> >>>> >> >>>> ----- Mail original ----- >> >>>>> De: "Kenneth Heafield"< [email protected] > >> >>>>> ?: [email protected] >> >>>>> Envoy?: Mercredi 13 Juillet 2011 20:53:46 >> >>>>> Objet: Re: [Moses-support] Using Moses language models >> >>>>> >> >>>>> I'd suggest adding a ngram_length member to LMResult then >> >>>>> modifying >> >>>>> each >> >>>>> model's wrapper (or just mine) to set that value. >> >>>>> >> >>>>> You're welcome to move stuff from LanguageModelKen.cpp to >> >>>>> LanguageModelKen.h as necessary. I chose this setup to minimize >> >>>>> unnecessary includes. >> >>>>> >> >>>>> Kenneth >> >>>>> >> >>>>> On 07/13/11 14:33, Marc LEGENDRE wrote: >> >>>>>> Well, not only the header is not "public", so to speak, (which I >> >>>>>> agree is not a major obstacle) >> >>>>>> but also the desired pointer is a private member of the class, >> >>>>>> and >> >>>>>> sadly lacks a getter. >> >>>>>> As far as I know, it means that accessing it will involve >> >>>>>> questionnable C++ tricks. >> >>>>>> (never tried, though) >> >>>>>> >> >>>>>> If modifying Moses is not too much of a chore, I'll give it a >> >>>>>> thought. >> >>>>>> >> >>>>>> Anyway, thank you for your answers. >> >>>>>> >> >>>>>> ----- Mail original ----- >> >>>>>>> De: "Hieu Hoang"< [email protected] > >> >>>>>>> ?: [email protected] >> >>>>>>> Envoy?: Mercredi 13 Juillet 2011 18:40:11 >> >>>>>>> Objet: Re: [Moses-support] Using Moses language models >> >>>>>>> i guess lm::Model is specific to the ken lm implementation. If >> >>>>>>> you >> >>>>>>> want >> >>>>>>> use it you should include the header yourself and cast whatever >> >>>>>>> you >> >>>>>>> need >> >>>>>>> to get the pointer. >> >>>>>>> >> >>>>>>> if you're feeling generous, maybe you can extend the moses LM >> >>>>>>> wrapper >> >>>>>>> so >> >>>>>>> that all LM implementations have the opportunity to return the >> >>>>>>> length >> >>>>>>> n-gram match. >> >>>>>>> >> >>>>>>> On 13/07/2011 21:51, Marc LEGENDRE wrote: >> >>>>>>>> The length of the n-gram match is sufficient for I want, >> >>>>>>>> indeed. >> >>>>>>>> I figured out how to do get it using directly kenlm, but as I >> >>>>>>>> am >> >>>>>>>> running the decoder, I wanted to use the already loaded LM. >> >>>>>>>> >> >>>>>>>> I first tried to dig my way through the Moses abstraction >> >>>>>>>> layers >> >>>>>>>> to >> >>>>>>>> retrieve a pointer to a lm::Model from kenlm, but the >> >>>>>>>> Moses::LanguageModelKen header is not part of the public >> >>>>>>>> headers >> >>>>>>>> of >> >>>>>>>> Moses ; that's why I tried to use only Moses interface. >> >>>>>>>> >> >>>>>>>> (I did I did not mention this alternative ; If someone knows >> >>>>>>>> how >> >>>>>>>> to >> >>>>>>>> get such a pointer, I can carry on from there) >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> ----- Mail original ----- >> >>>>>>>>> De: "Kenneth Heafield"< [email protected] > >> >>>>>>>>> ?: "Marc LEGENDRE"< [email protected] > >> >>>>>>>>> Envoy?: Mercredi 13 Juillet 2011 16:12:27 >> >>>>>>>>> Objet: Re: [Moses-support] Using Moses language models >> >>>>>>>>> The definition of unknown is that the word you asked for (the >> >>>>>>>>> rightmost >> >>>>>>>>> one) is mapped to i.e. an OOV. >> >>>>>>>>> >> >>>>>>>>> Are you looking for: >> >>>>>>>>> >> >>>>>>>>> 1) Length of n-gram matched in the model >> >>>>>>>>> >> >>>>>>>>> or >> >>>>>>>>> >> >>>>>>>>> 2) Length of state you must keep for valid continuation to >> >>>>>>>>> the >> >>>>>>>>> right >> >>>>>>>>> >> >>>>>>>>> These are slightly different things due to state >> >>>>>>>>> minimization. >> >>>>>>>>> The >> >>>>>>>>> moses abstraction layer does not return either in a general >> >>>>>>>>> way. >> >>>>>>>>> However, if you're using KenLM, #2 is in the returned state's >> >>>>>>>>> valid_length_. Further, #1 is in >> >>>>>>>>> FullScoreReturn.ngram_length. >> >>>>>>>>> So >> >>>>>>>>> if >> >>>>>>>>> you call KenLM directly these are easy to obtain (and you can >> >>>>>>>>> decide >> >>>>>>>>> whether to expose them through the Moses abstraction layer). >> >>>>>>>>> >> >>>>>>>>> Outside the decoder, you can run >> >>>>>>>>> >> >>>>>>>>> kenlm/query model_file null >> >>>>>>>>> >> >>>>>>>>> then provide your trigrams on stdin. >> >>>>>>>>> >> >>>>>>>>> Here's an example with kenlm/query kenlm/lm/test.arpa null >> >>>>>>>>> >> >>>>>>>>> looking on a >> >>>>>>>>> looking=23 1 -1.28594 on=25 2 -0.46389 a=5 3 -0.0483513 >> >>>>>>>>> Total: -1.79818 OOV: 0 >> >>>>>>>>> >> >>>>>>>>> The format is "word=vocab_id ngram_length score". So this is >> >>>>>>>>> a >> >>>>>>>>> trigram >> >>>>>>>>> in the model because "a=5 3" appears. >> >>>>>>>>> >> >>>>>>>>> On 07/13/11 08:50, Marc LEGENDRE wrote: >> >>>>>>>>>> Hello, >> >>>>>>>>>> >> >>>>>>>>>> I am trying to use the language models loaded by Moses ; >> >>>>>>>>>> >> >>>>>>>>>> I am using a 3-gram LM, and I need to know whether it >> >>>>>>>>>> contains >> >>>>>>>>>> a >> >>>>>>>>>> given N-gram or not. >> >>>>>>>>>> I tried to play around with >> >>>>>>>>>> LanguageModelImplementation::GetValueForgotState(...), >> >>>>>>>>>> but the boolean 'unknown' in the returned structure does not >> >>>>>>>>>> seem >> >>>>>>>>>> to >> >>>>>>>>>> be what I'm looking for. >> >>>>>>>>>> >> >>>>>>>>>> Is there any simple way of getting this piece of information >> >>>>>>>>>> ? >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> Regards, >> >>>>>>>>>> Marc Legendre >> >>>>>>>>>> _______________________________________________ >> >>>>>>>>>> Moses-support mailing list >> >>>>>>>>>> [email protected] >> >>>>>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support >> >>>>>>>> _______________________________________________ >> >>>>>>>> Moses-support mailing list >> >>>>>>>> [email protected] >> >>>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support >> >>>>>>>> >> >>>>>>>> >> >>>>>>> _______________________________________________ >> >>>>>>> Moses-support mailing list >> >>>>>>> [email protected] >> >>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support >> >>>>>> _______________________________________________ >> >>>>>> Moses-support mailing list >> >>>>>> [email protected] >> >>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support >> >>>>> _______________________________________________ >> >>>>> Moses-support mailing list >> >>>>> [email protected] >> >>>>> http://mailman.mit.edu/mailman/listinfo/moses-support >> >>>>> >> >>>> _______________________________________________ >> >>>> Moses-support mailing list >> >>>> [email protected] >> >>>> http://mailman.mit.edu/mailman/listinfo/moses-support >> >>>> >> >>>> >> >>> _______________________________________________ >> >>> Moses-support mailing list >> >>> [email protected] >> >>> http://mailman.mit.edu/mailman/listinfo/moses-support >> >> _______________________________________________ >> >> Moses-support mailing list >> >> [email protected] >> >> http://mailman.mit.edu/mailman/listinfo/moses-support >> >> >> >> >> >> >> >> _______________________________________________ >> >> Moses-support mailing list >> >> [email protected] >> >> http://mailman.mit.edu/mailman/listinfo/moses-support >> >> >> >> >> ------------------------------ >> >> _______________________________________________ >> Moses-support mailing list >> [email protected] >> http://mailman.mit.edu/mailman/listinfo/moses-support >> >> >> End of Moses-support Digest, Vol 57, Issue 44 >> ********************************************* > > > > > -- > Thu. > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support > > _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
