Hello all,
I'm currently trying to train Moses on aligned subtitles obtained from the opus corpus website. The files have been cleaned and formatted in a similar way to the standard Europarl files. There are a series of NAN errors after Giza begins the HMM stage of training. The corpus has been cleaned using the appropriate script and the sentence length has been limited to 40, although many sentences are much less than this. I'm guessing there's some strange characters messing things up or something like that, but wondered if others had encountered this issue and could possibly provide advice. Many thanks, Kevin. Kevin A. Wilson, MS Research Computing Division RTI International 3040 Cornwallis Road P.O. Box 12194 Research Triangle Park NC 27709-2194 (919) 485-5521 www.rti.org <http://www.rti.org/>
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
