Are your text file UTF8 encoded? Are there double space or tabs in your data?
This won't affect mkcls but it will affect phrase extraction later on - have you escape Moses reserved characters? You can do it with the Moses script scripts/tokenizer/escape-special-chars.perl On 17 February 2014 22:40, youyou fan <[email protected]> wrote: > > > ----------転送メッセージ---------- > From: *youyou fan* <[email protected]> > 日付: 2014年2月17日月曜日 > 件名: unknown training messages > To: "[email protected]" <[email protected]> > > > > Hello all, > > I used command "train-model,.perl" to train a Japanese-English model. In > the first step(I think it is 'mkcls'), I got a lot of messages like below. > > > Use of uninitialized value in concatenation (.) or string at > ./train-model.perl line 924, <IN_EN> line 4375. > 'nknown word '. > Use of uninitialized value in concatenation (.) or string at > ./train-model.perl line 924, <IN_EN> line 4375. > 'nknown word '。 > Use of uninitialized value in concatenation (.) or string at > ./train-model.perl line 924, <IN_EN> line 4376. > 'nknown word '. > Use of uninitialized value in concatenation (.) or string at > ./train-model.perl line 924, <IN_EN> line 4376. > 'nknown word '。 > ... > > 'nknown word '. is the last word of each English sentence and > 'nknown word '。 is the last word of each Japanese sentence. > > So what should I do about the parallel corpus? > And do these errors affect the training model? > > Regards, > Fan > > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support > > -- Hieu Hoang Research Associate University of Edinburgh http://www.hoang.co.uk/hieu
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
