Hi Cyrine, I think this is because tokenizer.perl expects utf-8 input (on STDIN).
This is because of the binmode(STDIN, ':utf8'); line in the tokenizer script.
Your input is maybe not utf-8? Ingrid On 06/27/2010 01:08 PM, Cyrine NASRI wrote:
Hello everyone, I try to run the script for my two tokenizer.perl development file. I'm having a problem when running, but I do not understand why. A message appears: /home/Bureau/moses/moses/scripts/tokenizer$ ./tokenizer.perl -l fr < /home/Bureau/work/test-fr.fr <http://test-fr.fr> > /home/Bureau/work/input.tok Tokenizer Version 1.0 Language: fr WARNING: No known abbreviations for language 'fr', attempting fall-back to English version... utf8 "\xE9" does not map to Unicode at ./tokenizer.perl line 47, <STDIN> line 1. Malformed UTF-8 character (fatal) at ./tokenizer.perl line 67, <STDIN> line 1. Thank you very much. Sincerely Cyrine _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
<<attachment: ingrid_falk.vcf>>
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
