Sorry I am not sure the bug I report is directly related to the issue,
because the bug I mentioned is kind of "random" (read violation on some
random address) and can hardly be reproduced on different machines. What
we can do is fixing it and try again. Also, I will look into the problem
you mentioned.
Chris Dyer wrote:
> I haven't looked into what's causing the particular problem on this
> corpus, but another known problem with the GIZA HMM model is that it
> doesn't do a fairly standard kind of normalization in the
> forward-backward training, which causes underflow errors in some
> sentences (especially quite long ones), which also leads to this
> problem.
>
> It seems that different systems handle very small floating point
> numbers differently, so this seems to be a bigger or smaller problem
> with different builds, but this also may interact with the fix the Qin
> is reporting. Qin, have you been able to determine if your fix
> corrects the problem with the German-English alignment?
>
> Chris
>
> On Thu, Feb 28, 2008 at 12:50 PM, Qin Gao <[EMAIL PROTECTED]> wrote:
>
>> Hi, Wilson,
>>
>> As I mentioned, GIZA++ may have a bug on HMM training stage, it will add
>> some random number to count table, and maybe it is the reason. You may
>> check the archive of the mailing list for the description of the bug,
>> also, you can simply comment out the lines marked with //*******// in
>> Array2.h to fix it.
>>
>> inline T*begin(){
>> #ifdef __STL_DEBUG //*******//
>> if( h1==0||h2==0)return 0;
>> #endif //*******//
>> return &(p[0]);
>> }
>> inline T*end(){
>> #ifdef __STL_DEBUG //*******//
>> if( h1==0||h2==0)return 0;
>> #endif //*******//
>> return &(p[0])+p.size();
>> }
>>
>> You may also be interested in trying a new version of Multi-threaded
>> GIZA++ with the bug fixed, and a much faster speed here
>>
>> http://www.cs.cmu.edu/~qing/
>>
>> Best,
>> Qin
>>
>>
>>
>> Wilson, Kevin wrote:
>> >
>> > Hello all,
>> >
>> > I'm currently trying to train Moses on aligned subtitles obtained from
>> > the opus corpus website. The files have been cleaned and formatted in
>> > a similar way to the standard Europarl files.
>> >
>> > There are a series of NAN errors after Giza begins the HMM stage of
>> > training. The corpus has been cleaned using the appropriate script and
>> > the sentence length has been limited to 40, although many sentences
>> > are much less than this.
>> >
>> > I'm guessing there's some strange characters messing things up or
>> > something like that, but wondered if others had encountered this issue
>> > and could possibly provide advice.
>> >
>> > Many thanks,
>> >
>> > Kevin.
>> >
>> > *Kevin A. Wilson, MS*
>> >
>> > Research Computing Division
>> >
>> > RTI International
>> >
>> > 3040 Cornwallis Road
>> >
>> > P.O. Box 12194
>> >
>> > Research Triangle Park
>> >
>> > NC 27709-2194
>> >
>> > (919) 485-5521
>> >
>>
>>
>>
>>> www.rti.org <http://www.rti.org/>
>>>
>> >
>> > ------------------------------------------------------------------------
>> >
>> > _______________________________________________
>> > Moses-support mailing list
>> > [email protected]
>> > http://mailman.mit.edu/mailman/listinfo/moses-support
>> >
>>
>> _______________________________________________
>> Moses-support mailing list
>> [email protected]
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support