Hello all,

 

I'm currently trying to train Moses on aligned subtitles obtained from
the opus corpus website. The files have been cleaned and formatted in a
similar way to the standard Europarl files.

 

There are a series of  NAN errors after Giza begins the HMM stage of
training. The corpus has been cleaned using the appropriate script and
the sentence length has been limited to 40, although many sentences are
much less than this.

 

I'm guessing there's some strange characters messing things up or
something like that, but wondered if others had encountered this issue
and could possibly provide advice.

 

Many thanks,

 

Kevin.

 

Kevin A. Wilson, MS

Research Computing Division 

RTI International

3040 Cornwallis Road

P.O. Box 12194 

Research Triangle Park

NC  27709-2194

(919) 485-5521

www.rti.org <http://www.rti.org/> 

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to