On Sun, Nov 30, 2008 at 16:34:43 -0500, Gwern Branwen wrote: > But as I said before, so far as I know, > we shouldn't have to care about possibly erroneous strings because the > only function in UTF8.lhs is 'encode :: String -> [Word8]', and the > Haskell runtime will never give us a malformed String. (If we do get a > bad string, certainly nothing encode can do will make the situation > worse.)
So, I was just about to apply this last night before getting attacked by another fit of paranoia. I'm going to need just a little bit more patient coaxing here: 1. Looking at Wikipedia, obviously the most reliable and stable place to learn things, I see that UTF-8 has a payload of up to 21 bits. 2. Haskell Char are 32 bit 3. I presume that by 'malformed' String, you mean "things which cannot be represented in 21 bits" or "not valid Unicode". I'm sorry if my language is sloppy here, but I hope what I'm saying makes sense So... how do we know that we will never get such Char? Ian assures us that "You can only put valid unicode values in a Char". But to what extent is that true? For example, can we do something nasty and low level to inadvertently produce those chars? Is there a scenario where this kind of thing may happen? Let's say we're reading in filenames. What if for some reason or another, the filenames use some crazy encoding with lots of non-Unicode things? That said, looking at http://hackage.haskell.org/packages/archive/utf8-string/0.3.3/doc/html/src/Codec-Binary-UTF8-String.html#encode vs. http://darcs.net/api-doc/src-UTF8.html it does seem that UTF8's encode function is more stringent than utf8-string's. So it seems that the consquences of having a crazy character are that we would have gotten an error along the lines of encodeUTF8: ord returned a value above 0x10FFFF anyway... and since that hasn't happened in the past to my knowledge, we can probably relax (although try and think about having the same behaviour or introducing the same behaviour into the utf8-string package). Anyway, that's the current state of my flip-flop. I'll probably apply it tomorrow unless somebody shouts. -- Eric Kow <http://www.nltg.brighton.ac.uk/home/Eric.Kow> PGP Key ID: 08AC04F9
signature.asc
Description: Digital signature
_______________________________________________ darcs-users mailing list [email protected] http://lists.osuosl.org/mailman/listinfo/darcs-users
