On Sun, Nov 30, 2008 at 16:34:43 -0500, Gwern Branwen wrote:
> But as I said before, so far as I know,
> we shouldn't have to care about possibly erroneous strings because the
> only function in UTF8.lhs is 'encode :: String -> [Word8]', and the
> Haskell runtime will never give us a malformed String. (If we do get a
> bad string, certainly nothing encode can do will make the situation
> worse.)

So, I was just about to apply this last night before getting attacked by
another fit of paranoia.  I'm going to need just a little bit more
patient coaxing here:

1. Looking at Wikipedia, obviously the most reliable and stable place to
   learn things, I see that UTF-8 has a payload of up to 21 bits.
2. Haskell Char are 32 bit
3. I presume that by 'malformed' String, you mean "things which cannot
   be represented in 21 bits" or "not valid Unicode".  I'm sorry if my
   language is sloppy here, but I hope what I'm saying makes sense

So... how do we know that we will never get such Char?  Ian assures us
that "You can only put valid unicode values in a Char".  But to what
extent is that true?  For example, can we do something nasty and low
level to inadvertently produce those chars?  Is there a scenario where
this kind of thing may happen?  Let's say we're reading in filenames.
What if for some reason or another, the filenames use some crazy
encoding with lots of non-Unicode things?

That said, looking at
  
http://hackage.haskell.org/packages/archive/utf8-string/0.3.3/doc/html/src/Codec-Binary-UTF8-String.html#encode
vs.
  http://darcs.net/api-doc/src-UTF8.html
it does seem that UTF8's encode function is more stringent than
utf8-string's.  So it seems that the consquences of having a
crazy character are that we would have gotten an error along the
lines of

  encodeUTF8: ord returned a value above 0x10FFFF

anyway... and since that hasn't happened in the past to my knowledge,
we can probably relax (although try and think about having the same
behaviour or introducing the same behaviour into the utf8-string
package).

Anyway, that's the current state of my flip-flop.  I'll probably
apply it tomorrow unless somebody shouts.

-- 
Eric Kow <http://www.nltg.brighton.ac.uk/home/Eric.Kow>
PGP Key ID: 08AC04F9

Attachment: signature.asc
Description: Digital signature

_______________________________________________
darcs-users mailing list
[email protected]
http://lists.osuosl.org/mailman/listinfo/darcs-users

Reply via email to