[darcs-users] high-level UTF-8 feedback

Eric Kow Wed, 23 Dec 2009 09:41:01 -0800

[Re-sent as I forgot to send the original to the list]

Hi Reinier,


We were just idly chatting in #darcs this morning when the new UTF-8
work (which just made it in, BTW, yay!) came up.

As our resident Unicode gurus, I thought you and Juliusz would be
interested in the conversation.  Given that you've thought about and
discussed this so much, it may all be old territory.  However, here
is the conversation anyway in case there is new stuff in for you to
see.

  http://irclog.perlgeek.de/darcs/2009-12-23#i_1864781

Juliusz: no requests on my part.  I thought you may want to see it
because there is an argument for going back to tagging new patches,
but don't feel obliged to weigh in if you're busy :-)

Summary for the impatient
-------------------------
The main principles here are that types are good and so is a
canonical representation.  -- Duncan

1. A type for when we know we're dealing with an internal representation of
   Unicode chars is a good idea, even just a ByteString newtype

2. We should make sure that there are not any hidden Unicode -> locale ->
   Unicode trips lurking in the code
   [and that it's strictly Unicode -> locale; or locale -> Unicode at most]

3. We should make sure that our Unicode -> locale is merely lossy and not
   partial.

4. We should watch out for non-NFC UTF-8 patches.
   These should be treated as old-style.

5. Despite our confidence about UTF-8 detection, tagging is still a good idea
   [a] for sheer conservatism and [b] for potential efficiency gains [c] 
   because it lets us reliably distinguish NFC UTF-8 vs the other one.

-- 
Eric Kow <http://www.nltg.brighton.ac.uk/home/Eric.Kow>
PGP Key ID: 08AC04F9

signature.asc
Description: Digital signature

_______________________________________________
darcs-users mailing list
[email protected]
http://lists.osuosl.org/mailman/listinfo/darcs-users

[darcs-users] high-level UTF-8 feedback

Reply via email to