The fix reported by Qin Gao does indeed repair some of the NaN
problems, so I would certainly advise you to incorporate this into
your GIZA build.  However, it with 1.8M segments, you may be well
encountering an underflow situation so this may not fix the problem.
Chris

On Tue, Mar 25, 2008 at 8:02 AM, John D. Burger <[EMAIL PROTECTED]> wrote:
> Chris Dyer wrote:
>
>  > I haven't looked into what's causing the particular problem on this
>  > corpus, but another known problem with the GIZA HMM model is that it
>  > doesn't do a fairly standard kind of normalization in the
>  > forward-backward training, which causes underflow errors in some
>  > sentences (especially quite long ones), which also leads to this
>  > problem.
>
>  I see from the archives that this has been reported a number of
>  times, and I am now running into it, training on about 1.8 million
>  segments from the LDC Hong Kong corpus.  I had no such problem on a
>  100K subset of this data, so I suspect it is indeed an issue of
>  corpus size and underflow.  FWIW, I'm using the default parameters
>  for the training script.
>
>  Qin Gao suggested a patch to Array2.h in the GIZA code - does this
>  indeed fix the problem?  If not, has anyone found another solution or
>  a workaround?
>
>  Thanks.
>
>  - John Burger
>    MITRE
>
>
> _______________________________________________
>  Moses-support mailing list
>  [email protected]
>  http://mailman.mit.edu/mailman/listinfo/moses-support
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to