on the subject of UTF8, i think the Moses tokeniser may be using the
version that is too strict.

i've just changed it to this:
>
binmode(STDIN, ":encoding(UTF-8)");
binmode(STDOUT, ":encoding(UTF-8)");
>


and later on in the same file,:
>
open(PREFIX, "<::encoding(UTF-8)", "$prefixfile");
>

see if this helps.

Miles

On 27 June 2010 13:15, Ingrid Falk <[email protected]> wrote:
> Hi Cyrine,
>
> I think this is because tokenizer.perl expects utf-8 input (on STDIN).
>
> This is because of the binmode(STDIN, ':utf8'); line in the tokenizer
> script.
>
> Your input is maybe not utf-8?
>
> Ingrid
>
> On 06/27/2010 01:08 PM, Cyrine NASRI wrote:
>>
>> Hello everyone,
>> I try to run the script for my two tokenizer.perl development file.
>> I'm having a problem when running, but I do not understand why.
>> A message appears:
>>
>>  /home/Bureau/moses/moses/scripts/tokenizer$ ./tokenizer.perl -l fr <
>> /home/Bureau/work/test-fr.fr <http://test-fr.fr> >
>> /home/Bureau/work/input.tok
>> Tokenizer Version 1.0
>> Language: fr
>> WARNING: No known abbreviations for language 'fr', attempting fall-back
>> to English version...
>> utf8 "\xE9" does not map to Unicode at ./tokenizer.perl line 47, <STDIN>
>> line 1.
>> Malformed UTF-8 character (fatal) at ./tokenizer.perl line 67, <STDIN>
>> line 1.
>>
>> Thank you very much.
>>
>> Sincerely
>> Cyrine
>>
>>
>>
>> _______________________________________________
>> Moses-support mailing list
>> [email protected]
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to