-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

On Sun, Nov 30, 2008 at 5:01 PM, Eric Kow  wrote:
> On Sun, Nov 30, 2008 at 16:34:43 -0500, Gwern Branwen wrote:
>> But as I said before, so far as I know, we shouldn't have to care
>> about possibly erroneous strings because the only function in UTF8.lhs
>> is 'encode :: String -> [Word8]', and the Haskell runtime will never
>> give us a malformed String. (If we do get a bad string, certainly
>> nothing encode can do will make the situation worse.)
>
> Hmm, in that case, I think I might apply the patch then
>
>> The crucial item in ByteStringUtils.hs is unpackPSfromUTF8 ::
>> ByteString -> String. I believe utf8-string handles this:
>>
http://hackage.haskell.org/packages/archive/utf8-string/0.3.3/doc/html/Data-ByteString-UTF8.html#v%3AtoString
>> 'Convert a UTF8 encoded bytestring into a Haskell string. Invalid
>> characters are replaced with '\xFFFD'.' (Note what I said earlier
>> about invalid characters.)
>
> So do you think we should also start phasing out this code in favour of
> utf8-string?  With my reading of the C and Haskelly bits, I get the
> understanding that we just error out on the first decoding error (so
> not quite the fancy behaviour UTF8.decode used to have that we never
> even used in the first place).  If that were the case I might be also
> convinced that utf8-string is the way to go (although maybe it would
> make sense here to also error out instead of silently accepting bad
> strings and replacing them with invalid-character tokens)

I do. We can easily emulate the call to 'error' in BytestringUtils.hs
by, I suspect, mapping over the String utf8-string produces us and
calling error on it. (ie. 'unpackPSfromUTF8 = map (\x -> if x ==
'\xFFFD' then error "String corrupted" else x) . decode').

And odds are that the code in utf8-string is more reliable than the
handrolled stuff in Darcs (although the obvious rebuttal to this
suggestion is 'Well, it's worked pretty well for half a decade now.')

- --
gwern
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)

iEYEAREKAAYFAkkzT68ACgkQvpDo5Pfl1oL3IwCcCAcdZL5gIjhzBDsBZBlNrYV6
flgAoJf0blgkUkuajYDSFiRw6fGsCv7/
=A6/B
-----END PGP SIGNATURE-----
_______________________________________________
darcs-users mailing list
[email protected]
http://lists.osuosl.org/mailman/listinfo/darcs-users

Reply via email to