Hi, Wilson,
As I mentioned, GIZA++ may have a bug on HMM training stage, it will add
some random number to count table, and maybe it is the reason. You may
check the archive of the mailing list for the description of the bug,
also, you can simply comment out the lines marked with //*******// in
Array2.h to fix it.
inline T*begin(){
#ifdef __STL_DEBUG //*******//
if( h1==0||h2==0)return 0;
#endif //*******//
return &(p[0]);
}
inline T*end(){
#ifdef __STL_DEBUG //*******//
if( h1==0||h2==0)return 0;
#endif //*******//
return &(p[0])+p.size();
}
You may also be interested in trying a new version of Multi-threaded
GIZA++ with the bug fixed, and a much faster speed here
http://www.cs.cmu.edu/~qing/
Best,
Qin
Wilson, Kevin wrote:
>
> Hello all,
>
> I’m currently trying to train Moses on aligned subtitles obtained from
> the opus corpus website. The files have been cleaned and formatted in
> a similar way to the standard Europarl files.
>
> There are a series of NAN errors after Giza begins the HMM stage of
> training. The corpus has been cleaned using the appropriate script and
> the sentence length has been limited to 40, although many sentences
> are much less than this.
>
> I’m guessing there’s some strange characters messing things up or
> something like that, but wondered if others had encountered this issue
> and could possibly provide advice.
>
> Many thanks,
>
> Kevin.
>
> *Kevin A. Wilson, MS*
>
> Research Computing Division
>
> RTI International
>
> 3040 Cornwallis Road
>
> P.O. Box 12194
>
> Research Triangle Park
>
> NC 27709-2194
>
> (919) 485-5521
>
> www.rti.org <http://www.rti.org/>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support