I had a similar problem with g++ 4.4 (Giza++ crashed on some smaller data sets). I found this http://permalink.gmane.org/gmane.comp.nlp.moses.user/4079 and reverting to 4.1 removed the problem.
There is also a comment http://comments.gmane.org/gmane.comp.nlp.moses.user/4079 with a different solution. I hope this helps, Jörg On Fri, Jul 22, 2011 at 7:09 PM, Angelina Ivanova <[email protected]> wrote: > Hello! > Thank you for the fast reply. Yes, I saw some links on GIZA++, but I > didn't find a solution or the hint what can cause this error. > > My environment is: > #62 UBUNTU 2.6.32-32-generic-pae > Moses Built on Jan 28 2009 > gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5) > giza-pp-v1.0.2 > > However, I can run Moses successfully on the other data. > > > > On Fri, Jul 22, 2011 at 6:34 PM, Thu Vuong Hoai <[email protected]> wrote: >> Hello, >> I found your error in the issues page of Giza++, could you please check this >> link http://code.google.com/p/giza-pp/issues/detail?id=15, I've thought it's >> not enough good for you but I want to ask about issue 11, do you fix it? and >> could you plz, provide more information about your environment? >> On Fri, Jul 22, 2011 at 11:04 PM, <[email protected]> wrote: >>> >>> Send Moses-support mailing list submissions to >>> [email protected] >>> >>> To subscribe or unsubscribe via the World Wide Web, visit >>> http://mailman.mit.edu/mailman/listinfo/moses-support >>> or, via email, send a message with subject or body 'help' to >>> [email protected] >>> >>> You can reach the person managing the list at >>> [email protected] >>> >>> When replying, please edit your Subject line so it is more specific >>> than "Re: Contents of Moses-support digest..." >>> >>> >>> Today's Topics: >>> >>> 1. Re: Using Moses language models (Barry Haddow) >>> 2. Re: Using Moses language models (Marc LEGENDRE) >>> 3. GIZA++: glibc detected (Angelina Ivanova) >>> >>> >>> ---------------------------------------------------------------------- >>> >>> Message: 1 >>> Date: Fri, 22 Jul 2011 09:14:47 +0100 >>> From: Barry Haddow <[email protected]> >>> Subject: Re: [Moses-support] Using Moses language models >>> To: [email protected], [email protected] >>> Message-ID: <[email protected]> >>> Content-Type: text/plain; charset="utf-8" >>> >>> On Friday 22 July 2011 03:50, Hieu Hoang wrote: >>> > true, & there's no right answer to it. >>> > >>> > I suppose 1 goal of the trunk is to make sure that the core >>> > functionality >>> > of translating isn't affected too much, in terms of quality, speed, or >>> > memory. ANother goal is to make not to overburden the API with things >>> > no-one else uses or implement. >>> > >>> > therefore, i think a good strategy is to branch & do what you like >>> > >>> >>> Hi Hieu >>> >>> I'm not sure I see the point of implementing this in a branch and never >>> merging. That's not a branch, it's a fork. The point of doing a small >>> change >>> like this in a branch would be so that the LM interface experts (ie you >>> and >>> Ken and ...) could have a look at it before it gets merged in. >>> >>> As regards how to implement the interface changes, what would be the >>> consequences of having other LM implementations throw an exception or an >>> assert for ngram_length? I think returning -1 is a very bad idea, >>> especially >>> as the return value is probably a size_t, and returning 0 could also lead >>> to >>> subtle and confusing behaviour. However if there is a return value with >>> the >>> semantics of "don't know" then that would be the ideal solution. >>> >>> cheers - Barry >>> >>> -- >>> The University of Edinburgh is a charitable body, registered in >>> Scotland, with registration number SC005336. >>> >>> >>> >>> ------------------------------ >>> >>> Message: 2 >>> Date: Fri, 22 Jul 2011 10:21:44 +0200 (CEST) >>> From: Marc LEGENDRE <[email protected]> >>> Subject: Re: [Moses-support] Using Moses language models >>> To: [email protected] >>> Cc: [email protected] >>> Message-ID: >>> <[email protected]> >>> Content-Type: text/plain; charset=ISO-8859-15 >>> >>> Well, we (me and the people I work with) were hoping not to have to >>> maintain >>> a modified version of Moses. >>> >>> Luckily, obviousness just hit me like a truck : if something is specific >>> to a LM, >>> it does not have to be in the top layer. >>> Having a common interface does not prevent subclasses from having a >>> specific behaviour, >>> we could have a LanguageModelKen method, say GetValueForgotStateKen(...) >>> which would return >>> something specific, say a LMKenResult, which would contain a LMResult plus >>> others things >>> like, say, a ngram_length field :-). >>> And the virtual GetValueForgotState() method would simply return the >>> LMResult from there. >>> >>> This way, no need to break the high level API, >>> and no extra maintenance cost for us (me and the peop... Well, you know). >>> >>> ----- Mail original ----- >>> > De: "Hieu Hoang" <[email protected]> >>> > ?: "Kenneth Heafield" <[email protected]> >>> > Cc: [email protected] >>> > Envoy?: Vendredi 22 Juillet 2011 04:50:14 >>> > Objet: Re: [Moses-support] Using Moses language models >>> > >>> > >>> > true, & there's no right answer to it. >>> > >>> > I suppose 1 goal of the trunk is to make sure that the core >>> > functionality of translating isn't affected too much, in terms of >>> > quality, speed, or memory. ANother goal is to make not to overburden >>> > the API with things no-one else uses or implement. >>> > >>> > therefore, i think a good strategy is to branch & do what you like >>> > >>> > >>> > On 21 July 2011 22:46, Kenneth Heafield < [email protected] > >>> > wrote: >>> > >>> > >>> > Marc makes a good point. When one language model provides more >>> > information than do other language models, it's difficult to maintain >>> > a >>> > common abstraction layer. Currently we're looking at n-gram length. >>> > SRILM doesn't provide access to that (but you can get right-looking >>> > state length which is usually the same thing). >>> > >>> > I'm working on making this issue more severe with left-looking state >>> > optimization and explicit hypothesis bounds. How do we change the >>> > decoder to use these features if not all of the language models >>> > support >>> > them? >>> > >>> > Maybe another class in the language model hierarchy supporting these >>> > additional features. But it's going to make the decoder look ugly if >>> > you want to support both. >>> > >>> > >>> > >>> > >>> > On 07/21/11 11:14, Hieu Hoang wrote: >>> > > hi marc, >>> > > >>> > > it'll be good for people to see your changes. >>> > > >>> > > i suppose you should create a branch and make your changes in >>> > > there. >>> > > >>> > > If there are other people interested, you can point them to your >>> > > branch. >>> > > If more people are interested and it doesn't affect other people >>> > > too >>> > > much, then we can move it to trunk. >>> > > >>> > > i'll email you offline with svn details >>> > > >>> > > On 21/07/2011 15:16, Marc LEGENDRE wrote: >>> > >> Alright, I gave this a try, and it did it for me. >>> > >> With kenlm, it is a ridiculously straightforward modification, >>> > >> but now I'm not sure how I can submit it : >>> > >> on one hand, I am not a "machine tranlation guy" and I don't >>> > >> imagine myself >>> > >> digging in every other LM to find how to set the ngram_length >>> > >> value; >>> > >> and on the other hand I would feel guilty to submit a 10-line >>> > >> patch and say >>> > >> "Guys, I need this, would you mind committing it and doing >>> > >> yourselves the >>> > >> necessary modifications in every other wrapper ?" >>> > >> >>> > >> How do you, Moses developers, feel about this ? >>> > >> Is it acceptable / outrageously stupid if I set the value to -1 in >>> > >> the other wrappers, >>> > >> maybe with a TODO, and properly document it in the super class ? >>> > >> >>> > >> ----- Mail original ----- >>> > >>> De: "Kenneth Heafield"< [email protected] > >>> > >>> ?: [email protected] >>> > >>> Envoy?: Mercredi 13 Juillet 2011 20:53:46 >>> > >>> Objet: Re: [Moses-support] Using Moses language models >>> > >>> >>> > >>> I'd suggest adding a ngram_length member to LMResult then >>> > >>> modifying >>> > >>> each >>> > >>> model's wrapper (or just mine) to set that value. >>> > >>> >>> > >>> You're welcome to move stuff from LanguageModelKen.cpp to >>> > >>> LanguageModelKen.h as necessary. I chose this setup to minimize >>> > >>> unnecessary includes. >>> > >>> >>> > >>> Kenneth >>> > >>> >>> > >>> On 07/13/11 14:33, Marc LEGENDRE wrote: >>> > >>>> Well, not only the header is not "public", so to speak, (which I >>> > >>>> agree is not a major obstacle) >>> > >>>> but also the desired pointer is a private member of the class, >>> > >>>> and >>> > >>>> sadly lacks a getter. >>> > >>>> As far as I know, it means that accessing it will involve >>> > >>>> questionnable C++ tricks. >>> > >>>> (never tried, though) >>> > >>>> >>> > >>>> If modifying Moses is not too much of a chore, I'll give it a >>> > >>>> thought. >>> > >>>> >>> > >>>> Anyway, thank you for your answers. >>> > >>>> >>> > >>>> ----- Mail original ----- >>> > >>>>> De: "Hieu Hoang"< [email protected] > >>> > >>>>> ?: [email protected] >>> > >>>>> Envoy?: Mercredi 13 Juillet 2011 18:40:11 >>> > >>>>> Objet: Re: [Moses-support] Using Moses language models >>> > >>>>> i guess lm::Model is specific to the ken lm implementation. If >>> > >>>>> you >>> > >>>>> want >>> > >>>>> use it you should include the header yourself and cast whatever >>> > >>>>> you >>> > >>>>> need >>> > >>>>> to get the pointer. >>> > >>>>> >>> > >>>>> if you're feeling generous, maybe you can extend the moses LM >>> > >>>>> wrapper >>> > >>>>> so >>> > >>>>> that all LM implementations have the opportunity to return the >>> > >>>>> length >>> > >>>>> n-gram match. >>> > >>>>> >>> > >>>>> On 13/07/2011 21:51, Marc LEGENDRE wrote: >>> > >>>>>> The length of the n-gram match is sufficient for I want, >>> > >>>>>> indeed. >>> > >>>>>> I figured out how to do get it using directly kenlm, but as I >>> > >>>>>> am >>> > >>>>>> running the decoder, I wanted to use the already loaded LM. >>> > >>>>>> >>> > >>>>>> I first tried to dig my way through the Moses abstraction >>> > >>>>>> layers >>> > >>>>>> to >>> > >>>>>> retrieve a pointer to a lm::Model from kenlm, but the >>> > >>>>>> Moses::LanguageModelKen header is not part of the public >>> > >>>>>> headers >>> > >>>>>> of >>> > >>>>>> Moses ; that's why I tried to use only Moses interface. >>> > >>>>>> >>> > >>>>>> (I did I did not mention this alternative ; If someone knows >>> > >>>>>> how >>> > >>>>>> to >>> > >>>>>> get such a pointer, I can carry on from there) >>> > >>>>>> >>> > >>>>>> >>> > >>>>>> ----- Mail original ----- >>> > >>>>>>> De: "Kenneth Heafield"< [email protected] > >>> > >>>>>>> ?: "Marc LEGENDRE"< [email protected] > >>> > >>>>>>> Envoy?: Mercredi 13 Juillet 2011 16:12:27 >>> > >>>>>>> Objet: Re: [Moses-support] Using Moses language models >>> > >>>>>>> The definition of unknown is that the word you asked for (the >>> > >>>>>>> rightmost >>> > >>>>>>> one) is mapped to<unk> i.e. an OOV. >>> > >>>>>>> >>> > >>>>>>> Are you looking for: >>> > >>>>>>> >>> > >>>>>>> 1) Length of n-gram matched in the model >>> > >>>>>>> >>> > >>>>>>> or >>> > >>>>>>> >>> > >>>>>>> 2) Length of state you must keep for valid continuation to >>> > >>>>>>> the >>> > >>>>>>> right >>> > >>>>>>> >>> > >>>>>>> These are slightly different things due to state >>> > >>>>>>> minimization. >>> > >>>>>>> The >>> > >>>>>>> moses abstraction layer does not return either in a general >>> > >>>>>>> way. >>> > >>>>>>> However, if you're using KenLM, #2 is in the returned state's >>> > >>>>>>> valid_length_. Further, #1 is in >>> > >>>>>>> FullScoreReturn.ngram_length. >>> > >>>>>>> So >>> > >>>>>>> if >>> > >>>>>>> you call KenLM directly these are easy to obtain (and you can >>> > >>>>>>> decide >>> > >>>>>>> whether to expose them through the Moses abstraction layer). >>> > >>>>>>> >>> > >>>>>>> Outside the decoder, you can run >>> > >>>>>>> >>> > >>>>>>> kenlm/query model_file null >>> > >>>>>>> >>> > >>>>>>> then provide your trigrams on stdin. >>> > >>>>>>> >>> > >>>>>>> Here's an example with kenlm/query kenlm/lm/test.arpa null >>> > >>>>>>> >>> > >>>>>>> looking on a >>> > >>>>>>> looking=23 1 -1.28594 on=25 2 -0.46389 a=5 3 -0.0483513 >>> > >>>>>>> Total: -1.79818 OOV: 0 >>> > >>>>>>> >>> > >>>>>>> The format is "word=vocab_id ngram_length score". So this is >>> > >>>>>>> a >>> > >>>>>>> trigram >>> > >>>>>>> in the model because "a=5 3" appears. >>> > >>>>>>> >>> > >>>>>>> On 07/13/11 08:50, Marc LEGENDRE wrote: >>> > >>>>>>>> Hello, >>> > >>>>>>>> >>> > >>>>>>>> I am trying to use the language models loaded by Moses ; >>> > >>>>>>>> >>> > >>>>>>>> I am using a 3-gram LM, and I need to know whether it >>> > >>>>>>>> contains >>> > >>>>>>>> a >>> > >>>>>>>> given N-gram or not. >>> > >>>>>>>> I tried to play around with >>> > >>>>>>>> LanguageModelImplementation::GetValueForgotState(...), >>> > >>>>>>>> but the boolean 'unknown' in the returned structure does not >>> > >>>>>>>> seem >>> > >>>>>>>> to >>> > >>>>>>>> be what I'm looking for. >>> > >>>>>>>> >>> > >>>>>>>> Is there any simple way of getting this piece of information >>> > >>>>>>>> ? >>> > >>>>>>>> >>> > >>>>>>>> >>> > >>>>>>>> Regards, >>> > >>>>>>>> Marc Legendre >>> > >>>>>>>> _______________________________________________ >>> > >>>>>>>> Moses-support mailing list >>> > >>>>>>>> [email protected] >>> > >>>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support >>> > >>>>>> _______________________________________________ >>> > >>>>>> Moses-support mailing list >>> > >>>>>> [email protected] >>> > >>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support >>> > >>>>>> >>> > >>>>>> >>> > >>>>> _______________________________________________ >>> > >>>>> Moses-support mailing list >>> > >>>>> [email protected] >>> > >>>>> http://mailman.mit.edu/mailman/listinfo/moses-support >>> > >>>> _______________________________________________ >>> > >>>> Moses-support mailing list >>> > >>>> [email protected] >>> > >>>> http://mailman.mit.edu/mailman/listinfo/moses-support >>> > >>> _______________________________________________ >>> > >>> Moses-support mailing list >>> > >>> [email protected] >>> > >>> http://mailman.mit.edu/mailman/listinfo/moses-support >>> > >>> >>> > >> _______________________________________________ >>> > >> Moses-support mailing list >>> > >> [email protected] >>> > >> http://mailman.mit.edu/mailman/listinfo/moses-support >>> > >> >>> > >> >>> > > _______________________________________________ >>> > > Moses-support mailing list >>> > > [email protected] >>> > > http://mailman.mit.edu/mailman/listinfo/moses-support >>> > _______________________________________________ >>> > Moses-support mailing list >>> > [email protected] >>> > http://mailman.mit.edu/mailman/listinfo/moses-support >>> > >>> > >>> > >>> > _______________________________________________ >>> > Moses-support mailing list >>> > [email protected] >>> > http://mailman.mit.edu/mailman/listinfo/moses-support >>> > >>> >>> >>> >>> ------------------------------ >>> >>> Message: 3 >>> Date: Fri, 22 Jul 2011 16:38:53 +0200 >>> From: Angelina Ivanova <[email protected]> >>> Subject: [Moses-support] GIZA++: glibc detected >>> To: [email protected] >>> Message-ID: >>> >>> <cahklk21bie0unchhrtdvqe69ep0i5k83+jvrnm7woiohocx...@mail.gmail.com> >>> Content-Type: text/plain; charset=ISO-8859-1 >>> >>> Hello, >>> I got the error below when I tried to align Russian to English. I >>> searched the error in the Internet and found out that the cause of the >>> problem could be in having a null sentence in the corpus. However, I >>> didn't detect any null sentences in my corpus. The encoding is UTF8 >>> and all previous experiments with the corpus that contained the one >>> from this as a subset, went smoothly. Could you please help me? >>> >>> *** glibc detected ***/moses/tools/bin/GIZA++: double free or >>> corruption (out): 0x14901578 *** >>> ======= Backtrace: ========= >>> [0x8166e81] >>> [0x8168946] >>> [0x813ebb1] >>> [0x80e6fe9] >>> [0x80d8420] >>> [0x80da791] >>> [0x806f55a] >>> [0x80742e8] >>> [0x814d9bb] >>> [0x8048151] >>> ======= Memory map: ======== >>> 00d4e000-00d4f000 r-xp 00000000 00:00 0 [vdso] >>> 08048000-081f6000 r-xp 00000000 00:1e 1612751353 /moses/tools/bin/GIZA++ >>> 081f6000-081f8000 rw-p 001ae000 00:1e 1612751353 /moses/tools/bin/GIZA++ >>> 081f8000-081ff000 rw-p 00000000 00:00 0 >>> 082ce000-1580d000 rw-p 00000000 00:00 0 [heap] >>> b5f00000-b5f23000 rw-p 00000000 00:00 0 >>> b5f23000-b6000000 ---p 00000000 00:00 0 >>> b6093000-b6106000 rw-p 00000000 00:00 0 >>> b6179000-b7099000 rw-p 00000000 00:00 0 >>> b70dd000-b7525000 rw-p 00000000 00:00 0 >>> b7561000-b76a7000 rw-p 00000000 00:00 0 >>> b76c0000-b7779000 rw-p 00000000 00:00 0 >>> bfb6a000-bfb7f000 rw-p 00000000 00:00 0 [stack] >>> Exit code: 1 >>> >>> >>> ------------------------------ >>> >>> _______________________________________________ >>> Moses-support mailing list >>> [email protected] >>> http://mailman.mit.edu/mailman/listinfo/moses-support >>> >>> >>> End of Moses-support Digest, Vol 57, Issue 40 >>> ********************************************* >> >> >> >> -- >> Thu. >> >> _______________________________________________ >> Moses-support mailing list >> [email protected] >> http://mailman.mit.edu/mailman/listinfo/moses-support >> >> > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support > -- ********************************************************************************** Jörg Tiedemann [email protected] Dep. of Linguistics and Philology http://stp.lingfil.uu.se/~joerg/ Uppsala University tel: +46 (0)18 - 471 1412 Box 635, SE-751 26 Uppsala/SWEDEN fax: +46 (0)18 - 471 1094 _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
