Alexander V Vershilov <alexander.vershi...@gmail.com> writes:

> The problem is that Prelude.getLine uses current locale to load characters:
> for example if you have utf8 locale, then everything works out of the box:
>
>> $ runhaskell 1.hs
>> résumé 履歴書 резюме
>> résumé 履歴書 резюме
>
> But if you change locale you'll have error:
>
>> LANG="C" runhaskell 1.hs
>> résumé 履歴書 резюме
>> 1.hs: <stdin>: hGetLine: invalid argument (invalid byte sequence)

That seems to be correct behaviour: the only way to know the
meaning of the bits input by a user is what encoding the user
says they are in.

But in general this issue is an instance of inheriting sins from
the OS: the meaning of the bit pattern in a file should be part
of the file, but we are stuck with OSs that use a global
variable (which should be anathema to Haskell). So if user A has
locale set one way and inputs a file and sends the filename to
user B on the same system, user B might well see something
completely different to A when looking at the file.

> To force haskell use UTF8 you can load string as byte sequence
> and convert it to UTF-8 charecters

but of course, the programmer can only hope that utf-8 will work
here. If the user is typing in KOI-8R, reading it as utf-8 is
going to be wrong.
-- 
Jón Fairbairn                                 jon.fairba...@cl.cam.ac.uk


_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Reply via email to