On Sun, Nov 30, 2008 at 16:34:43 -0500, Gwern Branwen wrote: > But as I said before, so far as I know, we shouldn't have to care > about possibly erroneous strings because the only function in UTF8.lhs > is 'encode :: String -> [Word8]', and the Haskell runtime will never > give us a malformed String. (If we do get a bad string, certainly > nothing encode can do will make the situation worse.)
Hmm, in that case, I think I might apply the patch then > The crucial item in ByteStringUtils.hs is unpackPSfromUTF8 :: > ByteString -> String. I believe utf8-string handles this: > http://hackage.haskell.org/packages/archive/utf8-string/0.3.3/doc/html/Data-ByteString-UTF8.html#v%3AtoString > 'Convert a UTF8 encoded bytestring into a Haskell string. Invalid > characters are replaced with '\xFFFD'.' (Note what I said earlier > about invalid characters.) So do you think we should also start phasing out this code in favour of utf8-string? With my reading of the C and Haskelly bits, I get the understanding that we just error out on the first decoding error (so not quite the fancy behaviour UTF8.decode used to have that we never even used in the first place). If that were the case I might be also convinced that utf8-string is the way to go (although maybe it would make sense here to also error out instead of silently accepting bad strings and replacing them with invalid-character tokens) -- Eric Kow <http://www.nltg.brighton.ac.uk/home/Eric.Kow> PGP Key ID: 08AC04F9
pgpFcdmITj2yH.pgp
Description: PGP signature
_______________________________________________ darcs-users mailing list [email protected] http://lists.osuosl.org/mailman/listinfo/darcs-users
