hi..i get this error when trying to tokenize merged documents:

~/demo/tools/bin/moses-scripts/scripts-20110118-1456/tokenizer$
./tokenizer.perl -l en < /home/roberto/demo/tools/working-dir/corpus/raw.en
> /home/roberto/demo/tools/working-dir/corpus/europarl222.tok.en
Tokenizer Version 1.0
Language: en
utf8 "\xED" does not map to Unicode at ./tokenizer.perl line 48, <STDIN>
line 72.
Malformed UTF-8 character (fatal) at ./tokenizer.perl line 67, <STDIN> line
72.

utf8 vs UTF-8..how do i fix this error?

itry the fix on : http://comments.gmane.org/gmane.comp.nlp.moses.user/3109
--> but does not work for me

thanks
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to