Chris Dyer wrote: > I haven't looked into what's causing the particular problem on this > corpus, but another known problem with the GIZA HMM model is that it > doesn't do a fairly standard kind of normalization in the > forward-backward training, which causes underflow errors in some > sentences (especially quite long ones), which also leads to this > problem.
I see from the archives that this has been reported a number of times, and I am now running into it, training on about 1.8 million segments from the LDC Hong Kong corpus. I had no such problem on a 100K subset of this data, so I suspect it is indeed an issue of corpus size and underflow. FWIW, I'm using the default parameters for the training script. Qin Gao suggested a patch to Array2.h in the GIZA code - does this indeed fix the problem? If not, has anyone found another solution or a workaround? Thanks. - John Burger MITRE _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
