[Re-sent as I forgot to send the original to the list] Hi Reinier,
We were just idly chatting in #darcs this morning when the new UTF-8 work (which just made it in, BTW, yay!) came up. As our resident Unicode gurus, I thought you and Juliusz would be interested in the conversation. Given that you've thought about and discussed this so much, it may all be old territory. However, here is the conversation anyway in case there is new stuff in for you to see. http://irclog.perlgeek.de/darcs/2009-12-23#i_1864781 Juliusz: no requests on my part. I thought you may want to see it because there is an argument for going back to tagging new patches, but don't feel obliged to weigh in if you're busy :-) Summary for the impatient ------------------------- The main principles here are that types are good and so is a canonical representation. -- Duncan 1. A type for when we know we're dealing with an internal representation of Unicode chars is a good idea, even just a ByteString newtype 2. We should make sure that there are not any hidden Unicode -> locale -> Unicode trips lurking in the code [and that it's strictly Unicode -> locale; or locale -> Unicode at most] 3. We should make sure that our Unicode -> locale is merely lossy and not partial. 4. We should watch out for non-NFC UTF-8 patches. These should be treated as old-style. 5. Despite our confidence about UTF-8 detection, tagging is still a good idea [a] for sheer conservatism and [b] for potential efficiency gains [c] because it lets us reliably distinguish NFC UTF-8 vs the other one. -- Eric Kow <http://www.nltg.brighton.ac.uk/home/Eric.Kow> PGP Key ID: 08AC04F9
signature.asc
Description: Digital signature
_______________________________________________ darcs-users mailing list [email protected] http://lists.osuosl.org/mailman/listinfo/darcs-users
