Re: [darcs-users] Fwd: UTF8 patch

Eric Kow Sun, 30 Nov 2008 14:01:40 -0800

On Sun, Nov 30, 2008 at 16:34:43 -0500, Gwern Branwen wrote:
> But as I said before, so far as I know, we shouldn't have to care
> about possibly erroneous strings because the only function in UTF8.lhs
> is 'encode :: String -> [Word8]', and the Haskell runtime will never
> give us a malformed String. (If we do get a bad string, certainly
> nothing encode can do will make the situation worse.)


Hmm, in that case, I think I might apply the patch then

> The crucial item in ByteStringUtils.hs is unpackPSfromUTF8 ::
> ByteString -> String. I believe utf8-string handles this:
> http://hackage.haskell.org/packages/archive/utf8-string/0.3.3/doc/html/Data-ByteString-UTF8.html#v%3AtoString
> 'Convert a UTF8 encoded bytestring into a Haskell string. Invalid
> characters are replaced with '\xFFFD'.' (Note what I said earlier
> about invalid characters.)

So do you think we should also start phasing out this code in favour of
utf8-string?  With my reading of the C and Haskelly bits, I get the
understanding that we just error out on the first decoding error (so
not quite the fancy behaviour UTF8.decode used to have that we never
even used in the first place).  If that were the case I might be also
convinced that utf8-string is the way to go (although maybe it would
make sense here to also error out instead of silently accepting bad
strings and replacing them with invalid-character tokens)

-- 
Eric Kow <http://www.nltg.brighton.ac.uk/home/Eric.Kow>
PGP Key ID: 08AC04F9

pgpFcdmITj2yH.pgp
Description: PGP signature

_______________________________________________
darcs-users mailing list
[email protected]
http://lists.osuosl.org/mailman/listinfo/darcs-users

Re: [darcs-users] Fwd: UTF8 patch

Reply via email to