Hello, I found your error in the issues page of Giza++, could you please check this link http://code.google.com/p/giza-pp/issues/detail?id=15, I've thought it's not enough good for you but I want to ask about issue 11, do you fix it? and could you plz, provide more information about your environment?
On Fri, Jul 22, 2011 at 11:04 PM, <[email protected]> wrote: > Send Moses-support mailing list submissions to > [email protected] > > To subscribe or unsubscribe via the World Wide Web, visit > http://mailman.mit.edu/mailman/listinfo/moses-support > or, via email, send a message with subject or body 'help' to > [email protected] > > You can reach the person managing the list at > [email protected] > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Moses-support digest..." > > > Today's Topics: > > 1. Re: Using Moses language models (Barry Haddow) > 2. Re: Using Moses language models (Marc LEGENDRE) > 3. GIZA++: glibc detected (Angelina Ivanova) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Fri, 22 Jul 2011 09:14:47 +0100 > From: Barry Haddow <[email protected]> > Subject: Re: [Moses-support] Using Moses language models > To: [email protected], [email protected] > Message-ID: <[email protected]> > Content-Type: text/plain; charset="utf-8" > > On Friday 22 July 2011 03:50, Hieu Hoang wrote: > > true, & there's no right answer to it. > > > > I suppose 1 goal of the trunk is to make sure that the core functionality > > of translating isn't affected too much, in terms of quality, speed, or > > memory. ANother goal is to make not to overburden the API with things > > no-one else uses or implement. > > > > therefore, i think a good strategy is to branch & do what you like > > > > Hi Hieu > > I'm not sure I see the point of implementing this in a branch and never > merging. That's not a branch, it's a fork. The point of doing a small > change > like this in a branch would be so that the LM interface experts (ie you and > Ken and ...) could have a look at it before it gets merged in. > > As regards how to implement the interface changes, what would be the > consequences of having other LM implementations throw an exception or an > assert for ngram_length? I think returning -1 is a very bad idea, > especially > as the return value is probably a size_t, and returning 0 could also lead > to > subtle and confusing behaviour. However if there is a return value with the > semantics of "don't know" then that would be the ideal solution. > > cheers - Barry > > -- > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > > > > ------------------------------ > > Message: 2 > Date: Fri, 22 Jul 2011 10:21:44 +0200 (CEST) > From: Marc LEGENDRE <[email protected]> > Subject: Re: [Moses-support] Using Moses language models > To: [email protected] > Cc: [email protected] > Message-ID: > <[email protected]> > Content-Type: text/plain; charset=ISO-8859-15 > > Well, we (me and the people I work with) were hoping not to have to > maintain > a modified version of Moses. > > Luckily, obviousness just hit me like a truck : if something is specific to > a LM, > it does not have to be in the top layer. > Having a common interface does not prevent subclasses from having a > specific behaviour, > we could have a LanguageModelKen method, say GetValueForgotStateKen(...) > which would return > something specific, say a LMKenResult, which would contain a LMResult plus > others things > like, say, a ngram_length field :-). > And the virtual GetValueForgotState() method would simply return the > LMResult from there. > > This way, no need to break the high level API, > and no extra maintenance cost for us (me and the peop... Well, you know). > > ----- Mail original ----- > > De: "Hieu Hoang" <[email protected]> > > ?: "Kenneth Heafield" <[email protected]> > > Cc: [email protected] > > Envoy?: Vendredi 22 Juillet 2011 04:50:14 > > Objet: Re: [Moses-support] Using Moses language models > > > > > > true, & there's no right answer to it. > > > > I suppose 1 goal of the trunk is to make sure that the core > > functionality of translating isn't affected too much, in terms of > > quality, speed, or memory. ANother goal is to make not to overburden > > the API with things no-one else uses or implement. > > > > therefore, i think a good strategy is to branch & do what you like > > > > > > On 21 July 2011 22:46, Kenneth Heafield < [email protected] > > > wrote: > > > > > > Marc makes a good point. When one language model provides more > > information than do other language models, it's difficult to maintain > > a > > common abstraction layer. Currently we're looking at n-gram length. > > SRILM doesn't provide access to that (but you can get right-looking > > state length which is usually the same thing). > > > > I'm working on making this issue more severe with left-looking state > > optimization and explicit hypothesis bounds. How do we change the > > decoder to use these features if not all of the language models > > support > > them? > > > > Maybe another class in the language model hierarchy supporting these > > additional features. But it's going to make the decoder look ugly if > > you want to support both. > > > > > > > > > > On 07/21/11 11:14, Hieu Hoang wrote: > > > hi marc, > > > > > > it'll be good for people to see your changes. > > > > > > i suppose you should create a branch and make your changes in > > > there. > > > > > > If there are other people interested, you can point them to your > > > branch. > > > If more people are interested and it doesn't affect other people > > > too > > > much, then we can move it to trunk. > > > > > > i'll email you offline with svn details > > > > > > On 21/07/2011 15:16, Marc LEGENDRE wrote: > > >> Alright, I gave this a try, and it did it for me. > > >> With kenlm, it is a ridiculously straightforward modification, > > >> but now I'm not sure how I can submit it : > > >> on one hand, I am not a "machine tranlation guy" and I don't > > >> imagine myself > > >> digging in every other LM to find how to set the ngram_length > > >> value; > > >> and on the other hand I would feel guilty to submit a 10-line > > >> patch and say > > >> "Guys, I need this, would you mind committing it and doing > > >> yourselves the > > >> necessary modifications in every other wrapper ?" > > >> > > >> How do you, Moses developers, feel about this ? > > >> Is it acceptable / outrageously stupid if I set the value to -1 in > > >> the other wrappers, > > >> maybe with a TODO, and properly document it in the super class ? > > >> > > >> ----- Mail original ----- > > >>> De: "Kenneth Heafield"< [email protected] > > > >>> ?: [email protected] > > >>> Envoy?: Mercredi 13 Juillet 2011 20:53:46 > > >>> Objet: Re: [Moses-support] Using Moses language models > > >>> > > >>> I'd suggest adding a ngram_length member to LMResult then > > >>> modifying > > >>> each > > >>> model's wrapper (or just mine) to set that value. > > >>> > > >>> You're welcome to move stuff from LanguageModelKen.cpp to > > >>> LanguageModelKen.h as necessary. I chose this setup to minimize > > >>> unnecessary includes. > > >>> > > >>> Kenneth > > >>> > > >>> On 07/13/11 14:33, Marc LEGENDRE wrote: > > >>>> Well, not only the header is not "public", so to speak, (which I > > >>>> agree is not a major obstacle) > > >>>> but also the desired pointer is a private member of the class, > > >>>> and > > >>>> sadly lacks a getter. > > >>>> As far as I know, it means that accessing it will involve > > >>>> questionnable C++ tricks. > > >>>> (never tried, though) > > >>>> > > >>>> If modifying Moses is not too much of a chore, I'll give it a > > >>>> thought. > > >>>> > > >>>> Anyway, thank you for your answers. > > >>>> > > >>>> ----- Mail original ----- > > >>>>> De: "Hieu Hoang"< [email protected] > > > >>>>> ?: [email protected] > > >>>>> Envoy?: Mercredi 13 Juillet 2011 18:40:11 > > >>>>> Objet: Re: [Moses-support] Using Moses language models > > >>>>> i guess lm::Model is specific to the ken lm implementation. If > > >>>>> you > > >>>>> want > > >>>>> use it you should include the header yourself and cast whatever > > >>>>> you > > >>>>> need > > >>>>> to get the pointer. > > >>>>> > > >>>>> if you're feeling generous, maybe you can extend the moses LM > > >>>>> wrapper > > >>>>> so > > >>>>> that all LM implementations have the opportunity to return the > > >>>>> length > > >>>>> n-gram match. > > >>>>> > > >>>>> On 13/07/2011 21:51, Marc LEGENDRE wrote: > > >>>>>> The length of the n-gram match is sufficient for I want, > > >>>>>> indeed. > > >>>>>> I figured out how to do get it using directly kenlm, but as I > > >>>>>> am > > >>>>>> running the decoder, I wanted to use the already loaded LM. > > >>>>>> > > >>>>>> I first tried to dig my way through the Moses abstraction > > >>>>>> layers > > >>>>>> to > > >>>>>> retrieve a pointer to a lm::Model from kenlm, but the > > >>>>>> Moses::LanguageModelKen header is not part of the public > > >>>>>> headers > > >>>>>> of > > >>>>>> Moses ; that's why I tried to use only Moses interface. > > >>>>>> > > >>>>>> (I did I did not mention this alternative ; If someone knows > > >>>>>> how > > >>>>>> to > > >>>>>> get such a pointer, I can carry on from there) > > >>>>>> > > >>>>>> > > >>>>>> ----- Mail original ----- > > >>>>>>> De: "Kenneth Heafield"< [email protected] > > > >>>>>>> ?: "Marc LEGENDRE"< [email protected] > > > >>>>>>> Envoy?: Mercredi 13 Juillet 2011 16:12:27 > > >>>>>>> Objet: Re: [Moses-support] Using Moses language models > > >>>>>>> The definition of unknown is that the word you asked for (the > > >>>>>>> rightmost > > >>>>>>> one) is mapped to<unk> i.e. an OOV. > > >>>>>>> > > >>>>>>> Are you looking for: > > >>>>>>> > > >>>>>>> 1) Length of n-gram matched in the model > > >>>>>>> > > >>>>>>> or > > >>>>>>> > > >>>>>>> 2) Length of state you must keep for valid continuation to > > >>>>>>> the > > >>>>>>> right > > >>>>>>> > > >>>>>>> These are slightly different things due to state > > >>>>>>> minimization. > > >>>>>>> The > > >>>>>>> moses abstraction layer does not return either in a general > > >>>>>>> way. > > >>>>>>> However, if you're using KenLM, #2 is in the returned state's > > >>>>>>> valid_length_. Further, #1 is in > > >>>>>>> FullScoreReturn.ngram_length. > > >>>>>>> So > > >>>>>>> if > > >>>>>>> you call KenLM directly these are easy to obtain (and you can > > >>>>>>> decide > > >>>>>>> whether to expose them through the Moses abstraction layer). > > >>>>>>> > > >>>>>>> Outside the decoder, you can run > > >>>>>>> > > >>>>>>> kenlm/query model_file null > > >>>>>>> > > >>>>>>> then provide your trigrams on stdin. > > >>>>>>> > > >>>>>>> Here's an example with kenlm/query kenlm/lm/test.arpa null > > >>>>>>> > > >>>>>>> looking on a > > >>>>>>> looking=23 1 -1.28594 on=25 2 -0.46389 a=5 3 -0.0483513 > > >>>>>>> Total: -1.79818 OOV: 0 > > >>>>>>> > > >>>>>>> The format is "word=vocab_id ngram_length score". So this is > > >>>>>>> a > > >>>>>>> trigram > > >>>>>>> in the model because "a=5 3" appears. > > >>>>>>> > > >>>>>>> On 07/13/11 08:50, Marc LEGENDRE wrote: > > >>>>>>>> Hello, > > >>>>>>>> > > >>>>>>>> I am trying to use the language models loaded by Moses ; > > >>>>>>>> > > >>>>>>>> I am using a 3-gram LM, and I need to know whether it > > >>>>>>>> contains > > >>>>>>>> a > > >>>>>>>> given N-gram or not. > > >>>>>>>> I tried to play around with > > >>>>>>>> LanguageModelImplementation::GetValueForgotState(...), > > >>>>>>>> but the boolean 'unknown' in the returned structure does not > > >>>>>>>> seem > > >>>>>>>> to > > >>>>>>>> be what I'm looking for. > > >>>>>>>> > > >>>>>>>> Is there any simple way of getting this piece of information > > >>>>>>>> ? > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> Regards, > > >>>>>>>> Marc Legendre > > >>>>>>>> _______________________________________________ > > >>>>>>>> Moses-support mailing list > > >>>>>>>> [email protected] > > >>>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support > > >>>>>> _______________________________________________ > > >>>>>> Moses-support mailing list > > >>>>>> [email protected] > > >>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support > > >>>>>> > > >>>>>> > > >>>>> _______________________________________________ > > >>>>> Moses-support mailing list > > >>>>> [email protected] > > >>>>> http://mailman.mit.edu/mailman/listinfo/moses-support > > >>>> _______________________________________________ > > >>>> Moses-support mailing list > > >>>> [email protected] > > >>>> http://mailman.mit.edu/mailman/listinfo/moses-support > > >>> _______________________________________________ > > >>> Moses-support mailing list > > >>> [email protected] > > >>> http://mailman.mit.edu/mailman/listinfo/moses-support > > >>> > > >> _______________________________________________ > > >> Moses-support mailing list > > >> [email protected] > > >> http://mailman.mit.edu/mailman/listinfo/moses-support > > >> > > >> > > > _______________________________________________ > > > Moses-support mailing list > > > [email protected] > > > http://mailman.mit.edu/mailman/listinfo/moses-support > > _______________________________________________ > > Moses-support mailing list > > [email protected] > > http://mailman.mit.edu/mailman/listinfo/moses-support > > > > > > > > _______________________________________________ > > Moses-support mailing list > > [email protected] > > http://mailman.mit.edu/mailman/listinfo/moses-support > > > > > > ------------------------------ > > Message: 3 > Date: Fri, 22 Jul 2011 16:38:53 +0200 > From: Angelina Ivanova <[email protected]> > Subject: [Moses-support] GIZA++: glibc detected > To: [email protected] > Message-ID: > <cahklk21bie0unchhrtdvqe69ep0i5k83+jvrnm7woiohocx...@mail.gmail.com > > > Content-Type: text/plain; charset=ISO-8859-1 > > Hello, > I got the error below when I tried to align Russian to English. I > searched the error in the Internet and found out that the cause of the > problem could be in having a null sentence in the corpus. However, I > didn't detect any null sentences in my corpus. The encoding is UTF8 > and all previous experiments with the corpus that contained the one > from this as a subset, went smoothly. Could you please help me? > > *** glibc detected ***/moses/tools/bin/GIZA++: double free or > corruption (out): 0x14901578 *** > ======= Backtrace: ========= > [0x8166e81] > [0x8168946] > [0x813ebb1] > [0x80e6fe9] > [0x80d8420] > [0x80da791] > [0x806f55a] > [0x80742e8] > [0x814d9bb] > [0x8048151] > ======= Memory map: ======== > 00d4e000-00d4f000 r-xp 00000000 00:00 0 [vdso] > 08048000-081f6000 r-xp 00000000 00:1e 1612751353 /moses/tools/bin/GIZA++ > 081f6000-081f8000 rw-p 001ae000 00:1e 1612751353 /moses/tools/bin/GIZA++ > 081f8000-081ff000 rw-p 00000000 00:00 0 > 082ce000-1580d000 rw-p 00000000 00:00 0 [heap] > b5f00000-b5f23000 rw-p 00000000 00:00 0 > b5f23000-b6000000 ---p 00000000 00:00 0 > b6093000-b6106000 rw-p 00000000 00:00 0 > b6179000-b7099000 rw-p 00000000 00:00 0 > b70dd000-b7525000 rw-p 00000000 00:00 0 > b7561000-b76a7000 rw-p 00000000 00:00 0 > b76c0000-b7779000 rw-p 00000000 00:00 0 > bfb6a000-bfb7f000 rw-p 00000000 00:00 0 [stack] > Exit code: 1 > > > ------------------------------ > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support > > > End of Moses-support Digest, Vol 57, Issue 40 > ********************************************* > -- Thu.
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
