-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 On Sun, Nov 30, 2008 at 5:01 PM, Eric Kow wrote: > On Sun, Nov 30, 2008 at 16:34:43 -0500, Gwern Branwen wrote: >> But as I said before, so far as I know, we shouldn't have to care >> about possibly erroneous strings because the only function in UTF8.lhs >> is 'encode :: String -> [Word8]', and the Haskell runtime will never >> give us a malformed String. (If we do get a bad string, certainly >> nothing encode can do will make the situation worse.) > > Hmm, in that case, I think I might apply the patch then > >> The crucial item in ByteStringUtils.hs is unpackPSfromUTF8 :: >> ByteString -> String. I believe utf8-string handles this: >> http://hackage.haskell.org/packages/archive/utf8-string/0.3.3/doc/html/Data-ByteString-UTF8.html#v%3AtoString >> 'Convert a UTF8 encoded bytestring into a Haskell string. Invalid >> characters are replaced with '\xFFFD'.' (Note what I said earlier >> about invalid characters.) > > So do you think we should also start phasing out this code in favour of > utf8-string? With my reading of the C and Haskelly bits, I get the > understanding that we just error out on the first decoding error (so > not quite the fancy behaviour UTF8.decode used to have that we never > even used in the first place). If that were the case I might be also > convinced that utf8-string is the way to go (although maybe it would > make sense here to also error out instead of silently accepting bad > strings and replacing them with invalid-character tokens)
I do. We can easily emulate the call to 'error' in BytestringUtils.hs by, I suspect, mapping over the String utf8-string produces us and calling error on it. (ie. 'unpackPSfromUTF8 = map (\x -> if x == '\xFFFD' then error "String corrupted" else x) . decode'). And odds are that the code in utf8-string is more reliable than the handrolled stuff in Darcs (although the obvious rebuttal to this suggestion is 'Well, it's worked pretty well for half a decade now.') - -- gwern -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) iEYEAREKAAYFAkkzT68ACgkQvpDo5Pfl1oL3IwCcCAcdZL5gIjhzBDsBZBlNrYV6 flgAoJf0blgkUkuajYDSFiRw6fGsCv7/ =A6/B -----END PGP SIGNATURE----- _______________________________________________ darcs-users mailing list [email protected] http://lists.osuosl.org/mailman/listinfo/darcs-users
