hi..i get this error when trying to tokenize merged documents: ~/demo/tools/bin/moses-scripts/scripts-20110118-1456/tokenizer$ ./tokenizer.perl -l en < /home/roberto/demo/tools/working-dir/corpus/raw.en > /home/roberto/demo/tools/working-dir/corpus/europarl222.tok.en Tokenizer Version 1.0 Language: en utf8 "\xED" does not map to Unicode at ./tokenizer.perl line 48, <STDIN> line 72. Malformed UTF-8 character (fatal) at ./tokenizer.perl line 67, <STDIN> line 72.
utf8 vs UTF-8..how do i fix this error? itry the fix on : http://comments.gmane.org/gmane.comp.nlp.moses.user/3109 --> but does not work for me thanks
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
