dc-rda  

Re: MARC and Unicode normalization forms

Karen Coyle
Wed, 18 Mar 2009 10:58:15 -0700

Simon, yes, I kind of assumed that C is considered the default, especially given that java language routines warn about it in their own default way. However, I heard from a Unicode org staff member (no longer there) that the technical committees were warming to D-composed because of the greater flexibility. Now, the big question is: does RDA itself have any preference? I suspect not, so we're back to library practice, and the fact that library transliteration has created some characters that can only be re-created in Unicode using D. Personally, I think that it might be necessary for libraries to re-examine why they are needing characters that no one else uses, and whether those should inform the library data future... but I think that's a discussion that we'll have later as we get further into data development.

kc

Simon Spero wrote:
On Tue, Mar 17, 2009 at 7:38 PM, Karen Coyle <kco...@kcoyle.net> wrote:

Rebecca,

Thank you so much for looking into this. As I understand the Unicode normal
forms, it's not that one of them is more "correct" than others, it's a
matter of circumstance and your particular needs. It does look like it would
be good for program developers to document what form their program outputs
in an effort to "save the time of the user."


The W3C recommends NFC: http://www.w3.org/TR/charmod-norm/#sec-ChoiceNFC

As does "HOWTO Avoid Being Called a Bozo When Producing
XML<http://hsivonen.iki.fi/producing-xml/#nfc>"
(cited from: http://www.ibm.com/developerworks/xml/library/x-think35.html)


--
-----------------------------------
Karen Coyle / Digital Library Consultant
kco...@kcoyle.net http://www.kcoyle.net
ph.: 510-540-7596   skype: kcoylenet
fx.: 510-848-3913
mo.: 510-435-8234
------------------------------------