On Sat, 2002-08-10 at 12:03, anatoli wrote: > --- Sven Moritz Hallberg <[EMAIL PROTECTED]> wrote: > > I argue _strongly_ against associating some sort of locale state with > > handles. > > > > 1) In agreement with Ashley's statements, file IO should use octets, > > because that's what's in a file. > > By the same token, we should handle CR/LF/CR-LF/LF-CR mess by hand. > (Files don't have lines in them, they are just sequences of octets.)
That's a good point, I've forgotten about this mess. I think that it's ugly, though, to do it somewhere outside, pretending the issue's not there. I value about Haskell it's clean representation of reality. Attaching all kinds of state to handles just isn't as clear as "Look here, a file: It's a sequence of octets.", "Watch out though, each file can use an entirely different encoding.", "The Char versions of the IO functions will try to deal with encoding for you.", and "If you know you need some special treatment, we have these functions blahblahblah..." > I prefer somewhat higher-level view of files. Of course, so do I, I just want the higher-level view to be implemented in Haskell, not under the hood of some ominous "handle" type; which, btw, will then no longer be simply a handle but some sort of great big file IO "object". That's confusing for anyone who hasn't been exposed to the C way of dealing with files. I'd teach some old people clean concepts they might not be used to, rather than repeating the same old yuck to every new little programmer who's just starting. > > 2) If you need to decode those octets to characters, or vice-versa, > > compose a (de)serialization function before it. > > I *always* need that. (Except for binary IO). Might as well have this > functionality built in a handle. Well, then *always* use the Char functions. I don't see the point. > > 3) A "best shot" character reading(or writing, for that matter) > > function, will be convenient. This should probably use your current > > locale, because when writing a character, you'll probably want to be > > able to write your own language's characters correctly. > > I routinely read and write messages in three different languages that > use three different encodings. All of them are my "own" languages. Where is the problem? The system is not going to be able to decide which one to use either way, so you must make the encoding explicit. Now we just have to come up with a convenient way to do it. Transforming between [Word8] and [Char] seems plausible to me. > > 4) For decoding, we'll need some parsing functionality, as someone > > already mentioned. With that we can have functions like parseUTF8. > > "Associating a locale with a stream", as you put it, is a matter of, if > > f is the raw Word8 stream, g = parseUTF8 f, where g is the Char stream, > > parsed as UTF-8-encoded characters from f. > > A "Word8 stream" can be either Handle (Word8Handle?) or [Word8]. We can transform > [Word8] to [Char], but not Word8Handle to CharHandle. I argue that the latter > is needed as well. The only reason for that would be efficiency. Simon said something about that. I admit that I have no clue about it. Sven Moritz
signature.asc
Description: This is a digitally signed message part
