Thank you everyone for the help and explanations! I appreciate that very much!
I got rid of my error by removing long sentences from my training
data. I missed this step during preparation of this particular set.


On Sat, Jul 23, 2011 at 4:19 AM, Tom Hoar
<[email protected]> wrote:
> This error was reported to the GIZA++ team as a Y2K bug with a fix similar
> to option 2 below that was tested on earlier and later versions of gcc. Not
> sure why the fix wasn't rolled into the GIZA++ trunk. MGIZA++ needs the same
> fix. I attached the diff that we apply to DoMY for both GIZA++ and MGIZA++
>
> Tom
>
>
>
> On Sat, 23 Jul 2011 02:53:53 +0700, Thu Vuong Hoai <[email protected]>
> wrote:
>
> it's issues 11 in
> code.google.com/giza-pp http://code.google.com/p/giza-pp/issues/detail?id=11,
>
> I know 2 solutions for this issue:
> 1. try to use compiler with c99 such as gcc version 4.1 (you did it)
> 2. edit source code like one comment
> in http://code.google.com/p/giza-pp/issues/detail?id=11, and use can use gcc
> version 4..4 and newer
> best regard.
>
> On Sat, Jul 23, 2011 at 1:17 AM, <[email protected]> wrote:
>>
>> Send Moses-support mailing list submissions to
>>        [email protected]
>>
>> To subscribe or unsubscribe via the World Wide Web, visit
>>        http://mailman.mit.edu/mailman/listinfo/moses-support
>> or, via email, send a message with subject or body 'help' to
>>        [email protected]
>>
>> You can reach the person managing the list at
>>        [email protected]
>>
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of Moses-support digest..."
>>
>>
>> Today's Topics:
>>
>>   1. Re: GIZA++: glibc detected (Angelina Ivanova) (Joerg Tiedemann)
>>   2. Re: Using Moses language models (Kenneth Heafield)
>>
>>
>> ----------------------------------------------------------------------
>>
>> Message: 1
>> Date: Fri, 22 Jul 2011 19:46:01 +0200
>> From: Joerg Tiedemann <[email protected]>
>> Subject: Re: [Moses-support] GIZA++: glibc detected (Angelina Ivanova)
>> To: Angelina Ivanova <[email protected]>
>> Cc: [email protected]
>> Message-ID:
>>        [email protected]>
>> Content-Type: text/plain; charset=ISO-8859-1
>>
>> I had a similar problem with g++ 4.4 (Giza++ crashed on some smaller
>> data sets). I found this
>> http://permalink.gmane.org/gmane.comp.nlp.moses.user/4079
>> and reverting to 4.1 removed the problem.
>>
>> There is also a comment
>> http://comments.gmane.org/gmane.comp.nlp.moses.user/4079
>> with a different solution.
>>
>> I hope this helps,
>> J?rg
>>
>>
>> On Fri, Jul 22, 2011 at 7:09 PM, Angelina Ivanova <[email protected]>
>> wrote:
>> > Hello!
>> > Thank you for the fast reply. Yes, I saw some links on GIZA++, but I
>> > didn't find a solution or the hint what can cause this error.
>> >
>> > My environment is:
>> > #62 UBUNTU 2.6.32-32-generic-pae
>> > Moses Built on Jan 28 2009
>> > gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5)
>> > giza-pp-v1.0.2
>> >
>> > However, I can run Moses successfully on the other data.
>> >
>> >
>> >
>> > On Fri, Jul 22, 2011 at 6:34 PM, Thu Vuong Hoai <[email protected]>
>> > wrote:
>> >> Hello,
>> >> I found your error in the issues page of Giza++, could you please check
>> >> this
>> >> link?http://code.google.com/p/giza-pp/issues/detail?id=15, I've thought
>> >> it's
>> >> not enough good for you but I want to ask about issue 11, do you fix
>> >> it? and
>> >> could you plz, provide more information about your environment?
>> >> On Fri, Jul 22, 2011 at 11:04 PM, <[email protected]>
>> >> wrote:
>> >>>
>> >>> Send Moses-support mailing list submissions to
>> >>> ? ? ? [email protected]
>> >>>
>> >>> To subscribe or unsubscribe via the World Wide Web, visit
>> >>> ? ? ? ?http://mailman.mit.edu/mailman/listinfo/moses-support
>> >>> or, via email, send a message with subject or body 'help' to
>> >>> ? ? ? [email protected]
>> >>>
>> >>> You can reach the person managing the list at
>> >>> ? ? ? [email protected]
>> >>>
>> >>> When replying, please edit your Subject line so it is more specific
>> >>> than "Re: Contents of Moses-support digest..."
>> >>>
>> >>>
>> >>> Today's Topics:
>> >>>
>> >>> ? 1. Re: Using Moses language models (Barry Haddow)
>> >>> ? 2. Re: Using Moses language models (Marc LEGENDRE)
>> >>> ? 3. GIZA++: glibc detected (Angelina Ivanova)
>> >>>
>> >>>
>> >>> ----------------------------------------------------------------------
>> >>>
>> >>> Message: 1
>> >>> Date: Fri, 22 Jul 2011 09:14:47 +0100
>> >>> From: Barry Haddow <[email protected]>
>> >>> Subject: Re: [Moses-support] Using Moses language models
>> >>> To: [email protected], [email protected]
>> >>> Message-ID: <[email protected]>
>> >>> Content-Type: text/plain; ?charset="utf-8"
>> >>>
>> >>> On Friday 22 July 2011 03:50, Hieu Hoang wrote:
>> >>> > true, & there's no right answer to it.
>> >>> >
>> >>> > I suppose 1 goal of the trunk is to make sure that the core
>> >>> > functionality
>> >>> > of translating isn't affected too much, in terms of quality, speed,
>> >>> > or
>> >>> > memory. ANother goal is to make not to overburden the API with
>> >>> > things
>> >>> > no-one else uses or implement.
>> >>> >
>> >>> > therefore, i think a good strategy is to branch & do what you like
>> >>> >
>> >>>
>> >>> Hi Hieu
>> >>>
>> >>> I'm not sure I see the point of implementing this in a branch and
>> >>> never
>> >>> merging. That's not a branch, it's a fork. The point of doing a small
>> >>> change
>> >>> like this in a branch would be so that the LM interface experts (ie
>> >>> you
>> >>> and
>> >>> Ken and ...) could have a look at it before it gets merged in.
>> >>>
>> >>> As regards how to implement the interface changes, what would be the
>> >>> consequences of having other LM implementations throw an exception or
>> >>> an
>> >>> assert for ngram_length? I think returning -1 is a very bad idea,
>> >>> especially
>> >>> as the return value is probably a size_t, and returning 0 could also
>> >>> lead
>> >>> to
>> >>> subtle and confusing behaviour. However if there is a return value
>> >>> with
>> >>> the
>> >>> semantics of "don't know" then that would be the ideal solution.
>> >>>
>> >>> cheers - Barry
>> >>>
>> >>> --
>> >>> The University of Edinburgh is a charitable body, registered in
>> >>> Scotland, with registration number SC005336.
>> >>>
>> >>>
>> >>>
>> >>> ------------------------------
>> >>>
>> >>> Message: 2
>> >>> Date: Fri, 22 Jul 2011 10:21:44 +0200 (CEST)
>> >>> From: Marc LEGENDRE <[email protected]>
>> >>> Subject: Re: [Moses-support] Using Moses language models
>> >>> To: [email protected]
>> >>> Cc: [email protected]
>> >>> Message-ID:
>> >>> ? ? ?
>> >>> ?<[email protected]>
>> >>> Content-Type: text/plain; charset=ISO-8859-15
>> >>>
>> >>> Well, we (me and the people I work with) were hoping not to have to
>> >>> maintain
>> >>> a modified version of Moses.
>> >>>
>> >>> Luckily, obviousness just hit me like a truck : if something is
>> >>> specific
>> >>> to a LM,
>> >>> it does not have to be in the top layer.
>> >>> Having a common interface does not prevent subclasses from having a
>> >>> specific behaviour,
>> >>> we could have a LanguageModelKen method, say
>> >>> GetValueForgotStateKen(...)
>> >>> which would return
>> >>> something specific, say a LMKenResult, which would contain a LMResult
>> >>> plus
>> >>> others things
>> >>> like, say, a ngram_length field :-).
>> >>> And the virtual GetValueForgotState() method would simply return the
>> >>> LMResult from there.
>> >>>
>> >>> This way, no need to break the high level API,
>> >>> and no extra maintenance cost for us (me and the peop... Well, you
>> >>> know).
>> >>>
>> >>> ----- Mail original -----
>> >>> > De: "Hieu Hoang" <[email protected]>
>> >>> > ?: "Kenneth Heafield" <[email protected]>
>> >>> > Cc: [email protected]
>> >>> > Envoy?: Vendredi 22 Juillet 2011 04:50:14
>> >>> > Objet: Re: [Moses-support] Using Moses language models
>> >>> >
>> >>> >
>> >>> > true, & there's no right answer to it.
>> >>> >
>> >>> > I suppose 1 goal of the trunk is to make sure that the core
>> >>> > functionality of translating isn't affected too much, in terms of
>> >>> > quality, speed, or memory. ANother goal is to make not to overburden
>> >>> > the API with things no-one else uses or implement.
>> >>> >
>> >>> > therefore, i think a good strategy is to branch & do what you like
>> >>> >
>> >>> >
>> >>> > On 21 July 2011 22:46, Kenneth Heafield < [email protected] >
>> >>> > wrote:
>> >>> >
>> >>> >
>> >>> > Marc makes a good point. When one language model provides more
>> >>> > information than do other language models, it's difficult to
>> >>> > maintain
>> >>> > a
>> >>> > common abstraction layer. Currently we're looking at n-gram length.
>> >>> > SRILM doesn't provide access to that (but you can get right-looking
>> >>> > state length which is usually the same thing).
>> >>> >
>> >>> > I'm working on making this issue more severe with left-looking state
>> >>> > optimization and explicit hypothesis bounds. How do we change the
>> >>> > decoder to use these features if not all of the language models
>> >>> > support
>> >>> > them?
>> >>> >
>> >>> > Maybe another class in the language model hierarchy supporting these
>> >>> > additional features. But it's going to make the decoder look ugly if
>> >>> > you want to support both.
>> >>> >
>> >>> >
>> >>> >
>> >>> >
>> >>> > On 07/21/11 11:14, Hieu Hoang wrote:
>> >>> > > hi marc,
>> >>> > >
>> >>> > > it'll be good for people to see your changes.
>> >>> > >
>> >>> > > i suppose you should create a branch and make your changes in
>> >>> > > there.
>> >>> > >
>> >>> > > If there are other people interested, you can point them to your
>> >>> > > branch.
>> >>> > > If more people are interested and it doesn't affect other people
>> >>> > > too
>> >>> > > much, then we can move it to trunk.
>> >>> > >
>> >>> > > i'll email you offline with svn details
>> >>> > >
>> >>> > > On 21/07/2011 15:16, Marc LEGENDRE wrote:
>> >>> > >> Alright, I gave this a try, and it did it for me.
>> >>> > >> With kenlm, it is a ridiculously straightforward modification,
>> >>> > >> but now I'm not sure how I can submit it :
>> >>> > >> on one hand, I am not a "machine tranlation guy" and I don't
>> >>> > >> imagine myself
>> >>> > >> digging in every other LM to find how to set the ngram_length
>> >>> > >> value;
>> >>> > >> and on the other hand I would feel guilty to submit a 10-line
>> >>> > >> patch and say
>> >>> > >> "Guys, I need this, would you mind committing it and doing
>> >>> > >> yourselves the
>> >>> > >> necessary modifications in every other wrapper ?"
>> >>> > >>
>> >>> > >> How do you, Moses developers, feel about this ?
>> >>> > >> Is it acceptable / outrageously stupid if I set the value to -1
>> >>> > >> in
>> >>> > >> the other wrappers,
>> >>> > >> maybe with a TODO, and properly document it in the super class ?
>> >>> > >>
>> >>> > >> ----- Mail original -----
>> >>> > >>> De: "Kenneth Heafield"< [email protected] >
>> >>> > >>> ?: [email protected]
>> >>> > >>> Envoy?: Mercredi 13 Juillet 2011 20:53:46
>> >>> > >>> Objet: Re: [Moses-support] Using Moses language models
>> >>> > >>>
>> >>> > >>> I'd suggest adding a ngram_length member to LMResult then
>> >>> > >>> modifying
>> >>> > >>> each
>> >>> > >>> model's wrapper (or just mine) to set that value.
>> >>> > >>>
>> >>> > >>> You're welcome to move stuff from LanguageModelKen.cpp to
>> >>> > >>> LanguageModelKen.h as necessary. I chose this setup to minimize
>> >>> > >>> unnecessary includes.
>> >>> > >>>
>> >>> > >>> Kenneth
>> >>> > >>>
>> >>> > >>> On 07/13/11 14:33, Marc LEGENDRE wrote:
>> >>> > >>>> Well, not only the header is not "public", so to speak, (which
>> >>> > >>>> I
>> >>> > >>>> agree is not a major obstacle)
>> >>> > >>>> but also the desired pointer is a private member of the class,
>> >>> > >>>> and
>> >>> > >>>> sadly lacks a getter.
>> >>> > >>>> As far as I know, it means that accessing it will involve
>> >>> > >>>> questionnable C++ tricks.
>> >>> > >>>> (never tried, though)
>> >>> > >>>>
>> >>> > >>>> If modifying Moses is not too much of a chore, I'll give it a
>> >>> > >>>> thought.
>> >>> > >>>>
>> >>> > >>>> Anyway, thank you for your answers.
>> >>> > >>>>
>> >>> > >>>> ----- Mail original -----
>> >>> > >>>>> De: "Hieu Hoang"< [email protected] >
>> >>> > >>>>> ?: [email protected]
>> >>> > >>>>> Envoy?: Mercredi 13 Juillet 2011 18:40:11
>> >>> > >>>>> Objet: Re: [Moses-support] Using Moses language models
>> >>> > >>>>> i guess lm::Model is specific to the ken lm implementation. If
>> >>> > >>>>> you
>> >>> > >>>>> want
>> >>> > >>>>> use it you should include the header yourself and cast
>> >>> > >>>>> whatever
>> >>> > >>>>> you
>> >>> > >>>>> need
>> >>> > >>>>> to get the pointer.
>> >>> > >>>>>
>> >>> > >>>>> if you're feeling generous, maybe you can extend the moses LM
>> >>> > >>>>> wrapper
>> >>> > >>>>> so
>> >>> > >>>>> that all LM implementations have the opportunity to return the
>> >>> > >>>>> length
>> >>> > >>>>> n-gram match.
>> >>> > >>>>>
>> >>> > >>>>> On 13/07/2011 21:51, Marc LEGENDRE wrote:
>> >>> > >>>>>> The length of the n-gram match is sufficient for I want,
>> >>> > >>>>>> indeed.
>> >>> > >>>>>> I figured out how to do get it using directly kenlm, but as I
>> >>> > >>>>>> am
>> >>> > >>>>>> running the decoder, I wanted to use the already loaded LM.
>> >>> > >>>>>>
>> >>> > >>>>>> I first tried to dig my way through the Moses abstraction
>> >>> > >>>>>> layers
>> >>> > >>>>>> to
>> >>> > >>>>>> retrieve a pointer to a lm::Model from kenlm, but the
>> >>> > >>>>>> Moses::LanguageModelKen header is not part of the public
>> >>> > >>>>>> headers
>> >>> > >>>>>> of
>> >>> > >>>>>> Moses ; that's why I tried to use only Moses interface.
>> >>> > >>>>>>
>> >>> > >>>>>> (I did I did not mention this alternative ; If someone knows
>> >>> > >>>>>> how
>> >>> > >>>>>> to
>> >>> > >>>>>> get such a pointer, I can carry on from there)
>> >>> > >>>>>>
>> >>> > >>>>>>
>> >>> > >>>>>> ----- Mail original -----
>> >>> > >>>>>>> De: "Kenneth Heafield"< [email protected] >
>> >>> > >>>>>>> ?: "Marc LEGENDRE"< [email protected] >
>> >>> > >>>>>>> Envoy?: Mercredi 13 Juillet 2011 16:12:27
>> >>> > >>>>>>> Objet: Re: [Moses-support] Using Moses language models
>> >>> > >>>>>>> The definition of unknown is that the word you asked for
>> >>> > >>>>>>> (the
>> >>> > >>>>>>> rightmost
>> >>> > >>>>>>> one) is mapped to i.e. an OOV.
>> >>> > >>>>>>>
>> >>> > >>>>>>> Are you looking for:
>> >>> > >>>>>>>
>> >>> > >>>>>>> 1) Length of n-gram matched in the model
>> >>> > >>>>>>>
>> >>> > >>>>>>> or
>> >>> > >>>>>>>
>> >>> > >>>>>>> 2) Length of state you must keep for valid continuation to
>> >>> > >>>>>>> the
>> >>> > >>>>>>> right
>> >>> > >>>>>>>
>> >>> > >>>>>>> These are slightly different things due to state
>> >>> > >>>>>>> minimization.
>> >>> > >>>>>>> The
>> >>> > >>>>>>> moses abstraction layer does not return either in a general
>> >>> > >>>>>>> way.
>> >>> > >>>>>>> However, if you're using KenLM, #2 is in the returned
>> >>> > >>>>>>> state's
>> >>> > >>>>>>> valid_length_. Further, #1 is in
>> >>> > >>>>>>> FullScoreReturn.ngram_length.
>> >>> > >>>>>>> So
>> >>> > >>>>>>> if
>> >>> > >>>>>>> you call KenLM directly these are easy to obtain (and you
>> >>> > >>>>>>> can
>> >>> > >>>>>>> decide
>> >>> > >>>>>>> whether to expose them through the Moses abstraction layer).
>> >>> > >>>>>>>
>> >>> > >>>>>>> Outside the decoder, you can run
>> >>> > >>>>>>>
>> >>> > >>>>>>> kenlm/query model_file null
>> >>> > >>>>>>>
>> >>> > >>>>>>> then provide your trigrams on stdin.
>> >>> > >>>>>>>
>> >>> > >>>>>>> Here's an example with kenlm/query kenlm/lm/test.arpa null
>> >>> > >>>>>>>
>> >>> > >>>>>>> looking on a
>> >>> > >>>>>>> looking=23 1 -1.28594 on=25 2 -0.46389 a=5 3 -0.0483513
>> >>> > >>>>>>> Total: -1.79818 OOV: 0
>> >>> > >>>>>>>
>> >>> > >>>>>>> The format is "word=vocab_id ngram_length score". So this is
>> >>> > >>>>>>> a
>> >>> > >>>>>>> trigram
>> >>> > >>>>>>> in the model because "a=5 3" appears.
>> >>> > >>>>>>>
>> >>> > >>>>>>> On 07/13/11 08:50, Marc LEGENDRE wrote:
>> >>> > >>>>>>>> Hello,
>> >>> > >>>>>>>>
>> >>> > >>>>>>>> I am trying to use the language models loaded by Moses ;
>> >>> > >>>>>>>>
>> >>> > >>>>>>>> I am using a 3-gram LM, and I need to know whether it
>> >>> > >>>>>>>> contains
>> >>> > >>>>>>>> a
>> >>> > >>>>>>>> given N-gram or not.
>> >>> > >>>>>>>> I tried to play around with
>> >>> > >>>>>>>> LanguageModelImplementation::GetValueForgotState(...),
>> >>> > >>>>>>>> but the boolean 'unknown' in the returned structure does
>> >>> > >>>>>>>> not
>> >>> > >>>>>>>> seem
>> >>> > >>>>>>>> to
>> >>> > >>>>>>>> be what I'm looking for.
>> >>> > >>>>>>>>
>> >>> > >>>>>>>> Is there any simple way of getting this piece of
>> >>> > >>>>>>>> information
>> >>> > >>>>>>>> ?
>> >>> > >>>>>>>>
>> >>> > >>>>>>>>
>> >>> > >>>>>>>> Regards,
>> >>> > >>>>>>>> Marc Legendre
>> >>> > >>>>>>>> _______________________________________________
>> >>> > >>>>>>>> Moses-support mailing list
>> >>> > >>>>>>>> [email protected]
>> >>> > >>>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>> >>> > >>>>>> _______________________________________________
>> >>> > >>>>>> Moses-support mailing list
>> >>> > >>>>>> [email protected]
>> >>> > >>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>> >>> > >>>>>>
>> >>> > >>>>>>
>> >>> > >>>>> _______________________________________________
>> >>> > >>>>> Moses-support mailing list
>> >>> > >>>>> [email protected]
>> >>> > >>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>> >>> > >>>> _______________________________________________
>> >>> > >>>> Moses-support mailing list
>> >>> > >>>> [email protected]
>> >>> > >>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>> >>> > >>> _______________________________________________
>> >>> > >>> Moses-support mailing list
>> >>> > >>> [email protected]
>> >>> > >>> http://mailman.mit.edu/mailman/listinfo/moses-support
>> >>> > >>>
>> >>> > >> _______________________________________________
>> >>> > >> Moses-support mailing list
>> >>> > >> [email protected]
>> >>> > >> http://mailman.mit.edu/mailman/listinfo/moses-support
>> >>> > >>
>> >>> > >>
>> >>> > > _______________________________________________
>> >>> > > Moses-support mailing list
>> >>> > > [email protected]
>> >>> > > http://mailman.mit.edu/mailman/listinfo/moses-support
>> >>> > _______________________________________________
>> >>> > Moses-support mailing list
>> >>> > [email protected]
>> >>> > http://mailman.mit.edu/mailman/listinfo/moses-support
>> >>> >
>> >>> >
>> >>> >
>> >>> > _______________________________________________
>> >>> > Moses-support mailing list
>> >>> > [email protected]
>> >>> > http://mailman.mit.edu/mailman/listinfo/moses-support
>> >>> >
>> >>>
>> >>>
>> >>>
>> >>> ------------------------------
>> >>>
>> >>> Message: 3
>> >>> Date: Fri, 22 Jul 2011 16:38:53 +0200
>> >>> From: Angelina Ivanova <[email protected]>
>> >>> Subject: [Moses-support] GIZA++: glibc detected
>> >>> To: [email protected]
>> >>> Message-ID:
>> >>>
>> >>> ?<cahklk21bie0unchhrtdvqe69ep0i5k83+jvrnm7woiohocx...@mail.gmail.com>
>> >>> Content-Type: text/plain; charset=ISO-8859-1
>> >>>
>> >>> Hello,
>> >>> I got the error below when I tried to align Russian to English. I
>> >>> searched the error in the Internet and found out that the cause of the
>> >>> problem could be in having a null sentence in the corpus. However, I
>> >>> didn't detect any null sentences in my corpus. The encoding is UTF8
>> >>> and all previous experiments with the corpus that contained the one
>> >>> from this as a subset, went smoothly. Could you please help me?
>> >>>
>> >>> *** glibc detected ***/moses/tools/bin/GIZA++: double free or
>> >>> corruption (out): 0x14901578 ***
>> >>> ======= Backtrace: =========
>> >>> [0x8166e81]
>> >>> [0x8168946]
>> >>> [0x813ebb1]
>> >>> [0x80e6fe9]
>> >>> [0x80d8420]
>> >>> [0x80da791]
>> >>> [0x806f55a]
>> >>> [0x80742e8]
>> >>> [0x814d9bb]
>> >>> [0x8048151]
>> >>> ======= Memory map: ========
>> >>> 00d4e000-00d4f000 r-xp 00000000 00:00 0 ? ? ? ? ?[vdso]
>> >>> 08048000-081f6000 r-xp 00000000 00:1e 1612751353
>> >>> ?/moses/tools/bin/GIZA++
>> >>> 081f6000-081f8000 rw-p 001ae000 00:1e 1612751353
>> >>> ?/moses/tools/bin/GIZA++
>> >>> 081f8000-081ff000 rw-p 00000000 00:00 0
>> >>> 082ce000-1580d000 rw-p 00000000 00:00 0 ? ? ? ? ?[heap]
>> >>> b5f00000-b5f23000 rw-p 00000000 00:00 0
>> >>> b5f23000-b6000000 ---p 00000000 00:00 0
>> >>> b6093000-b6106000 rw-p 00000000 00:00 0
>> >>> b6179000-b7099000 rw-p 00000000 00:00 0
>> >>> b70dd000-b7525000 rw-p 00000000 00:00 0
>> >>> b7561000-b76a7000 rw-p 00000000 00:00 0
>> >>> b76c0000-b7779000 rw-p 00000000 00:00 0
>> >>> bfb6a000-bfb7f000 rw-p 00000000 00:00 0 ? ? ? ? ?[stack]
>> >>> Exit code: 1
>> >>>
>> >>>
>> >>> ------------------------------
>> >>>
>> >>> _______________________________________________
>> >>> Moses-support mailing list
>> >>> [email protected]
>> >>> http://mailman.mit.edu/mailman/listinfo/moses-support
>> >>>
>> >>>
>> >>> End of Moses-support Digest, Vol 57, Issue 40
>> >>> *********************************************
>> >>
>> >>
>> >>
>> >> --
>> >> Thu.
>> >>
>> >> _______________________________________________
>> >> Moses-support mailing list
>> >> [email protected]
>> >> http://mailman.mit.edu/mailman/listinfo/moses-support
>> >>
>> >>
>> >
>> > _______________________________________________
>> > Moses-support mailing list
>> > [email protected]
>> > http://mailman.mit.edu/mailman/listinfo/moses-support
>> >
>>
>>
>>
>> --
>>
>> **********************************************************************************
>> ?J?rg
>> Tiedemann?????????????????????????????????????jorg.tiedem...@lingfil.uu.se
>> ?Dep. of Linguistics and Philology
>> http://stp.lingfil.uu.se/~joerg/
>> ?Uppsala University????????????????????????????????? tel:? +46 (0)18 - 471
>> 1412
>> ?Box 635, SE-751 26 Uppsala/SWEDEN?? fax: +46 (0)18 - 471 1094
>>
>>
>>
>> ------------------------------
>>
>> Message: 2
>> Date: Fri, 22 Jul 2011 14:18:21 -0400
>> From: Kenneth Heafield <[email protected]>
>> Subject: Re: [Moses-support] Using Moses language models
>> To: Marc LEGENDRE <[email protected]>
>> Cc: [email protected], [email protected]
>> Message-ID: <[email protected]>
>> Content-Type: text/plain; charset=ISO-8859-15
>>
>> Hi Marc,
>>
>>        This sounds like a simple change, so a branch is probably too much
>> overhead.  Please do one of the following:
>>
>> 1. Send a patch as generated by diff -rupN $old $new .  Do a make clean
>> first.
>> 2. Attach the files you modified and send them, along with the revision
>> you based changes on.
>> 3. Make a branch (if you already did).
>>
>> Thanks,
>>
>> Kenneth
>>
>> On 07/22/11 04:21, Marc LEGENDRE wrote:
>> > Well, we (me and the people I work with) were hoping not to have to
>> > maintain
>> > a modified version of Moses.
>> >
>> > Luckily, obviousness just hit me like a truck : if something is specific
>> > to a LM,
>> > it does not have to be in the top layer.
>> > Having a common interface does not prevent subclasses from having a
>> > specific behaviour,
>> > we could have a LanguageModelKen method, say GetValueForgotStateKen(...)
>> > which would return
>> > something specific, say a LMKenResult, which would contain a LMResult
>> > plus others things
>> > like, say, a ngram_length field :-).
>> > And the virtual GetValueForgotState() method would simply return the
>> > LMResult from there.
>> >
>> > This way, no need to break the high level API,
>> > and no extra maintenance cost for us (me and the peop... Well, you
>> > know).
>> >
>> > ----- Mail original -----
>> >> De: "Hieu Hoang" <[email protected]>
>> >> ?: "Kenneth Heafield" <[email protected]>
>> >> Cc: [email protected]
>> >> Envoy?: Vendredi 22 Juillet 2011 04:50:14
>> >> Objet: Re: [Moses-support] Using Moses language models
>> >>
>> >>
>> >> true, & there's no right answer to it.
>> >>
>> >> I suppose 1 goal of the trunk is to make sure that the core
>> >> functionality of translating isn't affected too much, in terms of
>> >> quality, speed, or memory. ANother goal is to make not to overburden
>> >> the API with things no-one else uses or implement.
>> >>
>> >> therefore, i think a good strategy is to branch & do what you like
>> >>
>> >>
>> >> On 21 July 2011 22:46, Kenneth Heafield < [email protected] >
>> >> wrote:
>> >>
>> >>
>> >> Marc makes a good point. When one language model provides more
>> >> information than do other language models, it's difficult to maintain
>> >> a
>> >> common abstraction layer. Currently we're looking at n-gram length.
>> >> SRILM doesn't provide access to that (but you can get right-looking
>> >> state length which is usually the same thing).
>> >>
>> >> I'm working on making this issue more severe with left-looking state
>> >> optimization and explicit hypothesis bounds. How do we change the
>> >> decoder to use these features if not all of the language models
>> >> support
>> >> them?
>> >>
>> >> Maybe another class in the language model hierarchy supporting these
>> >> additional features. But it's going to make the decoder look ugly if
>> >> you want to support both.
>> >>
>> >>
>> >>
>> >>
>> >> On 07/21/11 11:14, Hieu Hoang wrote:
>> >>> hi marc,
>> >>>
>> >>> it'll be good for people to see your changes.
>> >>>
>> >>> i suppose you should create a branch and make your changes in
>> >>> there.
>> >>>
>> >>> If there are other people interested, you can point them to your
>> >>> branch.
>> >>> If more people are interested and it doesn't affect other people
>> >>> too
>> >>> much, then we can move it to trunk.
>> >>>
>> >>> i'll email you offline with svn details
>> >>>
>> >>> On 21/07/2011 15:16, Marc LEGENDRE wrote:
>> >>>> Alright, I gave this a try, and it did it for me.
>> >>>> With kenlm, it is a ridiculously straightforward modification,
>> >>>> but now I'm not sure how I can submit it :
>> >>>> on one hand, I am not a "machine tranlation guy" and I don't
>> >>>> imagine myself
>> >>>> digging in every other LM to find how to set the ngram_length
>> >>>> value;
>> >>>> and on the other hand I would feel guilty to submit a 10-line
>> >>>> patch and say
>> >>>> "Guys, I need this, would you mind committing it and doing
>> >>>> yourselves the
>> >>>> necessary modifications in every other wrapper ?"
>> >>>>
>> >>>> How do you, Moses developers, feel about this ?
>> >>>> Is it acceptable / outrageously stupid if I set the value to -1 in
>> >>>> the other wrappers,
>> >>>> maybe with a TODO, and properly document it in the super class ?
>> >>>>
>> >>>> ----- Mail original -----
>> >>>>> De: "Kenneth Heafield"< [email protected] >
>> >>>>> ?: [email protected]
>> >>>>> Envoy?: Mercredi 13 Juillet 2011 20:53:46
>> >>>>> Objet: Re: [Moses-support] Using Moses language models
>> >>>>>
>> >>>>> I'd suggest adding a ngram_length member to LMResult then
>> >>>>> modifying
>> >>>>> each
>> >>>>> model's wrapper (or just mine) to set that value.
>> >>>>>
>> >>>>> You're welcome to move stuff from LanguageModelKen.cpp to
>> >>>>> LanguageModelKen.h as necessary. I chose this setup to minimize
>> >>>>> unnecessary includes.
>> >>>>>
>> >>>>> Kenneth
>> >>>>>
>> >>>>> On 07/13/11 14:33, Marc LEGENDRE wrote:
>> >>>>>> Well, not only the header is not "public", so to speak, (which I
>> >>>>>> agree is not a major obstacle)
>> >>>>>> but also the desired pointer is a private member of the class,
>> >>>>>> and
>> >>>>>> sadly lacks a getter.
>> >>>>>> As far as I know, it means that accessing it will involve
>> >>>>>> questionnable C++ tricks.
>> >>>>>> (never tried, though)
>> >>>>>>
>> >>>>>> If modifying Moses is not too much of a chore, I'll give it a
>> >>>>>> thought.
>> >>>>>>
>> >>>>>> Anyway, thank you for your answers.
>> >>>>>>
>> >>>>>> ----- Mail original -----
>> >>>>>>> De: "Hieu Hoang"< [email protected] >
>> >>>>>>> ?: [email protected]
>> >>>>>>> Envoy?: Mercredi 13 Juillet 2011 18:40:11
>> >>>>>>> Objet: Re: [Moses-support] Using Moses language models
>> >>>>>>> i guess lm::Model is specific to the ken lm implementation. If
>> >>>>>>> you
>> >>>>>>> want
>> >>>>>>> use it you should include the header yourself and cast whatever
>> >>>>>>> you
>> >>>>>>> need
>> >>>>>>> to get the pointer.
>> >>>>>>>
>> >>>>>>> if you're feeling generous, maybe you can extend the moses LM
>> >>>>>>> wrapper
>> >>>>>>> so
>> >>>>>>> that all LM implementations have the opportunity to return the
>> >>>>>>> length
>> >>>>>>> n-gram match.
>> >>>>>>>
>> >>>>>>> On 13/07/2011 21:51, Marc LEGENDRE wrote:
>> >>>>>>>> The length of the n-gram match is sufficient for I want,
>> >>>>>>>> indeed.
>> >>>>>>>> I figured out how to do get it using directly kenlm, but as I
>> >>>>>>>> am
>> >>>>>>>> running the decoder, I wanted to use the already loaded LM.
>> >>>>>>>>
>> >>>>>>>> I first tried to dig my way through the Moses abstraction
>> >>>>>>>> layers
>> >>>>>>>> to
>> >>>>>>>> retrieve a pointer to a lm::Model from kenlm, but the
>> >>>>>>>> Moses::LanguageModelKen header is not part of the public
>> >>>>>>>> headers
>> >>>>>>>> of
>> >>>>>>>> Moses ; that's why I tried to use only Moses interface.
>> >>>>>>>>
>> >>>>>>>> (I did I did not mention this alternative ; If someone knows
>> >>>>>>>> how
>> >>>>>>>> to
>> >>>>>>>> get such a pointer, I can carry on from there)
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> ----- Mail original -----
>> >>>>>>>>> De: "Kenneth Heafield"< [email protected] >
>> >>>>>>>>> ?: "Marc LEGENDRE"< [email protected] >
>> >>>>>>>>> Envoy?: Mercredi 13 Juillet 2011 16:12:27
>> >>>>>>>>> Objet: Re: [Moses-support] Using Moses language models
>> >>>>>>>>> The definition of unknown is that the word you asked for (the
>> >>>>>>>>> rightmost
>> >>>>>>>>> one) is mapped to i.e. an OOV.
>> >>>>>>>>>
>> >>>>>>>>> Are you looking for:
>> >>>>>>>>>
>> >>>>>>>>> 1) Length of n-gram matched in the model
>> >>>>>>>>>
>> >>>>>>>>> or
>> >>>>>>>>>
>> >>>>>>>>> 2) Length of state you must keep for valid continuation to
>> >>>>>>>>> the
>> >>>>>>>>> right
>> >>>>>>>>>
>> >>>>>>>>> These are slightly different things due to state
>> >>>>>>>>> minimization.
>> >>>>>>>>> The
>> >>>>>>>>> moses abstraction layer does not return either in a general
>> >>>>>>>>> way.
>> >>>>>>>>> However, if you're using KenLM, #2 is in the returned state's
>> >>>>>>>>> valid_length_. Further, #1 is in
>> >>>>>>>>> FullScoreReturn.ngram_length.
>> >>>>>>>>> So
>> >>>>>>>>> if
>> >>>>>>>>> you call KenLM directly these are easy to obtain (and you can
>> >>>>>>>>> decide
>> >>>>>>>>> whether to expose them through the Moses abstraction layer).
>> >>>>>>>>>
>> >>>>>>>>> Outside the decoder, you can run
>> >>>>>>>>>
>> >>>>>>>>> kenlm/query model_file null
>> >>>>>>>>>
>> >>>>>>>>> then provide your trigrams on stdin.
>> >>>>>>>>>
>> >>>>>>>>> Here's an example with kenlm/query kenlm/lm/test.arpa null
>> >>>>>>>>>
>> >>>>>>>>> looking on a
>> >>>>>>>>> looking=23 1 -1.28594 on=25 2 -0.46389 a=5 3 -0.0483513
>> >>>>>>>>> Total: -1.79818 OOV: 0
>> >>>>>>>>>
>> >>>>>>>>> The format is "word=vocab_id ngram_length score". So this is
>> >>>>>>>>> a
>> >>>>>>>>> trigram
>> >>>>>>>>> in the model because "a=5 3" appears.
>> >>>>>>>>>
>> >>>>>>>>> On 07/13/11 08:50, Marc LEGENDRE wrote:
>> >>>>>>>>>> Hello,
>> >>>>>>>>>>
>> >>>>>>>>>> I am trying to use the language models loaded by Moses ;
>> >>>>>>>>>>
>> >>>>>>>>>> I am using a 3-gram LM, and I need to know whether it
>> >>>>>>>>>> contains
>> >>>>>>>>>> a
>> >>>>>>>>>> given N-gram or not.
>> >>>>>>>>>> I tried to play around with
>> >>>>>>>>>> LanguageModelImplementation::GetValueForgotState(...),
>> >>>>>>>>>> but the boolean 'unknown' in the returned structure does not
>> >>>>>>>>>> seem
>> >>>>>>>>>> to
>> >>>>>>>>>> be what I'm looking for.
>> >>>>>>>>>>
>> >>>>>>>>>> Is there any simple way of getting this piece of information
>> >>>>>>>>>> ?
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> Regards,
>> >>>>>>>>>> Marc Legendre
>> >>>>>>>>>> _______________________________________________
>> >>>>>>>>>> Moses-support mailing list
>> >>>>>>>>>> [email protected]
>> >>>>>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>> >>>>>>>> _______________________________________________
>> >>>>>>>> Moses-support mailing list
>> >>>>>>>> [email protected]
>> >>>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>> _______________________________________________
>> >>>>>>> Moses-support mailing list
>> >>>>>>> [email protected]
>> >>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>> >>>>>> _______________________________________________
>> >>>>>> Moses-support mailing list
>> >>>>>> [email protected]
>> >>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>> >>>>> _______________________________________________
>> >>>>> Moses-support mailing list
>> >>>>> [email protected]
>> >>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>> >>>>>
>> >>>> _______________________________________________
>> >>>> Moses-support mailing list
>> >>>> [email protected]
>> >>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>> >>>>
>> >>>>
>> >>> _______________________________________________
>> >>> Moses-support mailing list
>> >>> [email protected]
>> >>> http://mailman.mit.edu/mailman/listinfo/moses-support
>> >> _______________________________________________
>> >> Moses-support mailing list
>> >> [email protected]
>> >> http://mailman.mit.edu/mailman/listinfo/moses-support
>> >>
>> >>
>> >>
>> >> _______________________________________________
>> >> Moses-support mailing list
>> >> [email protected]
>> >> http://mailman.mit.edu/mailman/listinfo/moses-support
>> >>
>>
>>
>> ------------------------------
>>
>> _______________________________________________
>> Moses-support mailing list
>> [email protected]
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>> End of Moses-support Digest, Vol 57, Issue 44
>> *********************************************
>
>
>
>
> --
> Thu.
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to