Alexander V Vershilov <alexander.vershi...@gmail.com> writes: > The problem is that Prelude.getLine uses current locale to load characters: > for example if you have utf8 locale, then everything works out of the box: > >> $ runhaskell 1.hs >> résumé 履歴書 резюме >> résumé 履歴書 резюме > > But if you change locale you'll have error: > >> LANG="C" runhaskell 1.hs >> résumé 履歴書 резюме >> 1.hs: <stdin>: hGetLine: invalid argument (invalid byte sequence)
That seems to be correct behaviour: the only way to know the meaning of the bits input by a user is what encoding the user says they are in. But in general this issue is an instance of inheriting sins from the OS: the meaning of the bit pattern in a file should be part of the file, but we are stuck with OSs that use a global variable (which should be anathema to Haskell). So if user A has locale set one way and inputs a file and sends the filename to user B on the same system, user B might well see something completely different to A when looking at the file. > To force haskell use UTF8 you can load string as byte sequence > and convert it to UTF-8 charecters but of course, the programmer can only hope that utf-8 will work here. If the user is typing in KOI-8R, reading it as utf-8 is going to be wrong. -- Jón Fairbairn jon.fairba...@cl.cam.ac.uk _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe