Hi Cyrine,

I think this is because tokenizer.perl expects utf-8 input (on STDIN).

This is because of the binmode(STDIN, ':utf8'); line in the tokenizer script.

Your input is maybe not utf-8?

Ingrid

On 06/27/2010 01:08 PM, Cyrine NASRI wrote:
Hello everyone,
I try to run the script for my two tokenizer.perl development file.
I'm having a problem when running, but I do not understand why.
A message appears:

  /home/Bureau/moses/moses/scripts/tokenizer$ ./tokenizer.perl -l fr <
/home/Bureau/work/test-fr.fr <http://test-fr.fr> >
/home/Bureau/work/input.tok
Tokenizer Version 1.0
Language: fr
WARNING: No known abbreviations for language 'fr', attempting fall-back
to English version...
utf8 "\xE9" does not map to Unicode at ./tokenizer.perl line 47, <STDIN>
line 1.
Malformed UTF-8 character (fatal) at ./tokenizer.perl line 67, <STDIN>
line 1.

Thank you very much.

Sincerely
Cyrine



_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

<<attachment: ingrid_falk.vcf>>

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to