Are your text file UTF8 encoded? Are there double space or tabs in your
data?

This won't affect mkcls but it will affect phrase extraction later on -
have you escape Moses reserved characters? You can do it with the Moses
script
   scripts/tokenizer/escape-special-chars.perl



On 17 February 2014 22:40, youyou fan <[email protected]> wrote:

>
>
> ----------転送メッセージ----------
> From: *youyou fan* <[email protected]>
> 日付: 2014年2月17日月曜日
> 件名: unknown training messages
> To: "[email protected]" <[email protected]>
>
>
>
> Hello all,
>
>  I used command "train-model,.perl" to train a Japanese-English model. In
> the first step(I think it is 'mkcls'), I got a lot of messages like below.
>
>
> Use of uninitialized value in concatenation (.) or string at
> ./train-model.perl line 924, <IN_EN> line 4375.
> 'nknown word '.
> Use of uninitialized value in concatenation (.) or string at
> ./train-model.perl line 924, <IN_EN> line 4375.
> 'nknown word '。
> Use of uninitialized value in concatenation (.) or string at
> ./train-model.perl line 924, <IN_EN> line 4376.
> 'nknown word '.
> Use of uninitialized value in concatenation (.) or string at
> ./train-model.perl line 924, <IN_EN> line 4376.
> 'nknown word '。
> ...
>
> 'nknown word '. is the last word of each English sentence and
> 'nknown word '。 is the last word of each Japanese sentence.
>
> So what should I do about the parallel corpus?
> And do these errors affect the training model?
>
> Regards,
> Fan
>
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>


-- 
Hieu Hoang
Research Associate
University of Edinburgh
http://www.hoang.co.uk/hieu
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to