Re[2]: [Haskell-cafe] How to use Unicode strings?
Hello Alexey, Sunday, November 23, 2008, 10:20:47 AM, you wrote: And this problem related not only to IO. It raises whenever strings cross border between haskell world and outside world. Opening files with unicode names, execing, etc. this completely depends on libraries, and ghc-bundled i/o libs doesn't support unicode filenames. freearc project contains its own simple i/o library that doesn't have this problem (and also support files 4gb on windows). unfortunately, this library doesn't include any buffering -- Best regards, Bulatmailto:[EMAIL PROTECTED] ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] How to use Unicode strings?
Alexey Khudyakov wrote: But this bring question what the right thing is? If locale is UTF8 or system support unicode some other way - no problem, just encode string properly. Problem is how to deal with untanslatable characters. Skip? Replace with question marks? Anything other? Probably we need to look how this is solved in other languages. (Or not solved) Regarding untranslatable characters, I think the only correct thing to do is consider it exceptional behavior and have the conversion function accept a handler function which takes the character as input and produces a string for it. That way programs can define their own behavior, since this is something that doesn't have a right way to recover in all cases. Canonical handlers which skip, replace with question marks (or other arbitrary character), throw actual exceptions, etc could be provided for convenience. For stream-based strings a al ByteString, dealing with this sort of a handler in an efficient manner is fairly straightforward (though some CPS tricks may be needed to get rid of the Maybe in the result of the basic converter). For [Char] strings efficiency is harder, but the implementation should still be easy (given the basic converter). Most extant languages I've seen tend to pick a single solution for all cases, but I don't think we should follow along that path. -- Live well, ~wren ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] How to use Unicode strings?
Please advise how to write Unicode string, so this example would work: main = do putStrLn Les signes orthographiques inclus les accents (aigus, grâve, circonflexe), le tréma, l'apostrophe, la cédille, le trait d'union et la majuscule. I get the following error: hello.hs:4:68: lexical error in string/character literal (UTF-8 decoding error) Failed, modules loaded: none. Prelude Also, how to read Unicode characters from standard input? Thanks! -- Dmitri O. Kondratiev [EMAIL PROTECTED] http://www.geocities.com/dkondr ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] How to use Unicode strings?
2008/11/22 Dmitri O.Kondratiev [EMAIL PROTECTED]: Please advise how to write Unicode string, so this example would work: main = do putStrLn Les signes orthographiques inclus les accents (aigus, grâve, circonflexe), le tréma, l'apostrophe, la cédille, le trait d'union et la majuscule. That really ought to work. Is the file encoded in UTF-8 (rather than, eg. latin-1)? Luke ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] How to use Unicode strings?
Excerpts from Dmitri O.Kondratiev's message of Sat Nov 22 05:40:41 -0600 2008: Please advise how to write Unicode string, so this example would work: main = do putStrLn Les signes orthographiques inclus les accents (aigus, grâve, circonflexe), le tréma, l'apostrophe, la cédille, le trait d'union et la majuscule. I get the following error: hello.hs:4:68: lexical error in string/character literal (UTF-8 decoding error) Failed, modules loaded: none. Prelude Also, how to read Unicode characters from standard input? Thanks! Hi, Check out the utf8-string package on hackage: http://hackage.haskell.org/cgi-bin/hackage-scripts/package/utf8-string In particular, you probably want the System.IO.UTF8 functions, which are identical to to their non-utf8 counterparts in System.IO except, well, they handle unicode properly. More specifically, you will probably want to mainly look at Codec.Binary.UTF8.String.encodeString and decodeString, respectively (in fact, most of the System.IO.UTF8 functions are defined in terms of these, e.g. 'putStrLn x = IO.putStrLn (encodeString x)' and 'getLine = liftM decodeString IO.getLine'.) Austin ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] How to use Unicode strings?
Alexey Khudyakov wrote: putStrLn Ну и где этот ваш хвалёный уникод? :-) -- Dr. Janis Voigtlaender http://wwwtcs.inf.tu-dresden.de/~voigt/ mailto:[EMAIL PROTECTED] ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] How to use Unicode strings?
alexey.skladnoy: That really ought to work. Is the file encoded in UTF-8 (rather than, eg. latin-1)? This should pretend to work. Simple print functions garble unicode characters. For example : putStrLn Ну и где этот ваш хвалёный уникод? prints following output C 8 345 MBB 20H E20;Q=K9 C=8:4? Not pretty? Althrough Dmitri's variant seems to work fine. Use the UTF8 printing functions, import qualified System.IO.UTF8 as U main = U.putStrLn Ну и где этот ваш хвалёный уникод? Running this, *Main main Ну и где этот ваш хвалёный уникод? -- Don ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] How to use Unicode strings?
On Sat, 2008-11-22 at 10:02 -0800, Don Stewart wrote: Use the UTF8 printing functions, import qualified System.IO.UTF8 as U main = U.putStrLn Ну и где этот ваш хвалёный уникод? Running this, *Main main Ну и где этот ваш хвалёный уникод? This upsets me. We need to get on with doing this properly. The System.IO.UTF8 module is a useful interim workaround but we're not using it properly most of the time. It is right when you're working with a text file that you know to be in the UTF-8 format. For example .cabal files are UTF-8, irrespective of the platform or the system locale. It is not right when working with the terminal. The encoding of the terminal is given by the locale. We cannot statically declare that it is UTF-8. The right thing to do is to make Prelude.putStrLn do the right thing. We had a long discussion on how to fix the H98 IO functions to do this better. We just need to get on with it, or we'll end up with too many cases of people using System.IO.UTF8 inappropriately. For the case where System.IO.UTF8 is right we probably still want a more general solution, like a handle setting for the encoding. Duncan ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] How to use Unicode strings?
This upsets me. We need to get on with doing this properly. The System.IO.UTF8 module is a useful interim workaround but we're not using it properly most of the time. ... skipped ... The right thing to do is to make Prelude.putStrLn do the right thing. We had a long discussion on how to fix the H98 IO functions to do this better. We just need to get on with it, or we'll end up with too many cases of people using System.IO.UTF8 inappropriately. But this bring question what the right thing is? If locale is UTF8 or system support unicode some other way - no problem, just encode string properly. Problem is how to deal with untanslatable characters. Skip? Replace with question marks? Anything other? Probably we need to look how this is solved in other languages. (Or not solved) And this problem related not only to IO. It raises whenever strings cross border between haskell world and outside world. Opening files with unicode names, execing, etc. For example: Prelude readFile файл *** Exception: D09;: openFile: does not exist (No such file or directory) Prelude executeFile echo True [Сейчас сломается] Nothing !59G0A A;05BAO Althrough it's possible to work around using encodeString/decodeString from Codec.Binary.UTF8.String it won't work on non-UTF8 systems. It's not only neandertalian systems with one-byte locales, windows AFAIK uses other unicode encoding. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] How to use Unicode strings?
alexey.skladnoy: This upsets me. We need to get on with doing this properly. The System.IO.UTF8 module is a useful interim workaround but we're not using it properly most of the time. ... skipped ... The right thing to do is to make Prelude.putStrLn do the right thing. We had a long discussion on how to fix the H98 IO functions to do this better. We just need to get on with it, or we'll end up with too many cases of people using System.IO.UTF8 inappropriately. But this bring question what the right thing is? If locale is UTF8 or system support unicode some other way - no problem, just encode string properly. Problem is how to deal with untanslatable characters. Skip? Replace with question marks? Anything other? Probably we need to look how this is solved in other languages. (Or not solved) And this problem related not only to IO. It raises whenever strings cross border between haskell world and outside world. Opening files with unicode names, execing, etc. For example: Prelude readFile файл *** Exception: D09;: openFile: does not exist (No such file or directory) Prelude executeFile echo True [Сейчас сломается] Nothing !59G0A A;05BAO Althrough it's possible to work around using encodeString/decodeString from Codec.Binary.UTF8.String it won't work on non-UTF8 systems. It's not only neandertalian systems with one-byte locales, windows AFAIK uses other unicode encoding. For just decoding / encoding in other locales, there are codec libraries. Hunt around on hackage. http://hackage.haskell.org/cgi-bin/hackage-scripts/package/encoding http://hackage.haskell.org/cgi-bin/hackage-scripts/package/Encode -- Don ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe