Re: [Haskell-cafe] Strings and utf-8
Bulat Ziganshin wrote: Hello Andrew, Thursday, November 29, 2007, 1:11:38 AM, you wrote: IMHO, someone should make a full proposal by implementing an alternative System.IO library that deals with all these encoding issues and implements H98 IO in terms of that. We need two seperate interfaces. One for text-mode I/O, one for raw binary I/O. When doing text-mode I/O, the programmer needs to be able to explicitly specify exactly which character encoding is required. (Presumably default to the current 8-bit truncation encoding?) http://haskell.org/haskellwiki/Library/Streams already exists Which would mean that we have streams to do character I/O, ByteString to do binary I/O, and System.IO to do, eh, something in between. That seems rather unfortunate to me. While the truncate to 8 bits semantics may be nice to keep old code working, it really isn't all that intuitive. When I do 'putStr u\776', I want a u with an umlaut to appear, not to get it printed as if it were u\8. The strange thing is that Hugs at the moment _does_ print a u-umlaut, while ghci prints u\8, which is a u followed by a backspace, so I see nothing. Reinier ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re[2]: [Haskell-cafe] Strings and utf-8
Hello Reinier, Thursday, November 29, 2007, 1:13:24 PM, you wrote: IMHO, someone should make a full proposal by implementing an alternative System.IO library that deals with all these encoding issues and implements H98 IO in terms of that. http://haskell.org/haskellwiki/Library/Streams already exists Which would mean that we have streams to do character I/O, ByteString to do binary I/O, and System.IO to do, eh, something in between. this means only that such proposal exists. i've worked on adding bytestream support too, but don't finished the work. at least it's possible. i hope that new i/o library will have modular design like this so it will be easy to add new features as 3rd-party libs -- Best regards, Bulatmailto:[EMAIL PROTECTED] ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Strings and utf-8
On Tue, 2007-11-27 at 18:38 +, Paul Johnson wrote: Brandon S. Allbery KF8NH wrote: However, the IO system truncates [characters] to 8 bits. Should this be considered a bug? A design problem. I presume that its because stdio.h was defined in the days of ASCII-only strings, and the functions in System.IO are defined in terms of stdio.h. But does this need to be the case in the future? When it's phrased as truncates to 8 bits it sounds so simple, surely all we need to do is not truncate to 8 bits right? The problem is, what encoding should it pick? UTF8, 16, 32, EBDIC? How would people specify that they really want to use a binary file. Whatever we change it'll break programs that use the existing meanings. One sensible suggestion many people have made is that H98 file IO should use the locale encoding and do Unicode/String - locale conversion. So that'd all be text files. Then openBinaryFile would be used for binary files. Of course then we'd need control over setting the encoding and what to do on encountering encoding errors. IMHO, someone should make a full proposal by implementing an alternative System.IO library that deals with all these encoding issues and implements H98 IO in terms of that. It doesn't have to be fast initially, it just has to get the API right and not design the API so as to exclude the possibility of a fast implementation later. Duncan ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Strings and utf-8
Duncan Coutts wrote: When it's phrased as truncates to 8 bits it sounds so simple, surely all we need to do is not truncate to 8 bits right? The problem is, what encoding should it pick? UTF8, 16, 32, EBDIC? How would people specify that they really want to use a binary file. Whatever we change it'll break programs that use the existing meanings. One sensible suggestion many people have made is that H98 file IO should use the locale encoding and do Unicode/String - locale conversion. So that'd all be text files. Then openBinaryFile would be used for binary files. Of course then we'd need control over setting the encoding and what to do on encountering encoding errors. IMHO, someone should make a full proposal by implementing an alternative System.IO library that deals with all these encoding issues and implements H98 IO in terms of that. It doesn't have to be fast initially, it just has to get the API right and not design the API so as to exclude the possibility of a fast implementation later. In my humble opinion, what should happen is this: We need two seperate interfaces. One for text-mode I/O, one for raw binary I/O. ByteString provides some of the latter. [Can you use that on network sockets?] I guess what's needed is a good binary library to go with it. [I know there's been quite a few people who've had a go at this part...] When doing text-mode I/O, the programmer needs to be able to explicitly specify exactly which character encoding is required. (Presumably default to the current 8-bit truncation encoding?) That way the programmer can decide exactly how to choose an encoding, rather than the library designer trying to guess what The Right Thing is for all possible application programs. And it needs to be possible to cleanly add new encodings too. I'd have a go at implementing all this myself, but I wouldn't know where to begin... ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re[2]: [Haskell-cafe] Strings and utf-8
Hello Andrew, Thursday, November 29, 2007, 1:11:38 AM, you wrote: IMHO, someone should make a full proposal by implementing an alternative System.IO library that deals with all these encoding issues and implements H98 IO in terms of that. We need two seperate interfaces. One for text-mode I/O, one for raw binary I/O. When doing text-mode I/O, the programmer needs to be able to explicitly specify exactly which character encoding is required. (Presumably default to the current 8-bit truncation encoding?) http://haskell.org/haskellwiki/Library/Streams already exists -- Best regards, Bulatmailto:[EMAIL PROTECTED] ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Strings and utf-8
Brandon S. Allbery KF8NH wrote: However, the IO system truncates [characters] to 8 bits. I Should this be considered a bug? I presume that its because stdio.h was defined in the days of ASCII-only strings, and the functions in System.IO are defined in terms of stdio.h. But does this need to be the case in the future? Unfortunately I don't know enough about Unicode IO to judge. Paul. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
[Haskell-cafe] Strings and utf-8
Hi, Are 'String's in GHC 6.6.1 UTF-8? Thanks, Maurício ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Strings and utf-8
On Nov 26, 2007, at 19:23 , Maurí cio wrote: Are 'String's in GHC 6.6.1 UTF-8? No. type String = [Char] and Char stores Unicode codepoints. However, the IO system truncates them to 8 bits. I think there are UTF8 marshaling libraries on hackage these days, though. -- brandon s. allbery [solaris,freebsd,perl,pugs,haskell] [EMAIL PROTECTED] system administrator [openafs,heimdal,too many hats] [EMAIL PROTECTED] electrical and computer engineering, carnegie mellon universityKF8NH ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Strings and utf-8
allbery: On Nov 26, 2007, at 19:23 , Maurí cio wrote: Are 'String's in GHC 6.6.1 UTF-8? No. type String = [Char] and Char stores Unicode codepoints. However, the IO system truncates them to 8 bits. I think there are UTF8 marshaling libraries on hackage these days, though. Yep, utf8string, in particular. -- Don ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe