Xavier Agenjo
Tue, 24 Mar 2009 11:18:27 -0700
What about ISO 25577, I mean, MarcXchange ?
Xavier Agenjo Bullón Director de Proyectos Fundación Ignacio Larramendi Claudio Coello, 123, 4º 28006 Madrid Telf.: (34) 915 81 25 37 Fax.: (34) 915 81 47 36 xavier.age...@larramendi.es www.larramendi.es Certificado ISO 9001. P No imprimir si no es necesario. Protejamos el Medio Ambiente -----Mensaje original----- De: List for discussion on Resource Description and Access (RDA) [mailto:dc-...@jiscmail.ac.uk] En nombre de Rebecca S Guenther Enviado el: martes, 17 de marzo de 2009 20:34 Para: DC-RDA@JISCMAIL.AC.UK Asunto: Re: MARC and Unicode normalization forms I ran this by a colleague here who has done a lot of these transformations, and he said the following: >From Morgan Cundiff: She says the "the MARC -> MARCXML program does not output Unicode Normal Form C". My first question would be "what program is that?". There are quite a few that do this. Whatever it is, she is probably right. I used Marc Report. I then used the perl script provided by OCLC to convert the marc slim file from Normalization Form D (decomposed) to Normalization Form C (composed). My understanding is that there is no Form C equivalent for a small number of the decomposed combinations used in marc records. So those stay decomposed. Morgan Rebecca S. Guenther Senior Networking and Standards Specialist Network Development and MARC Standards Office Library of Congress 101 Independence Ave. SE Washington, DC 20540 Washington, DC 20540-4402 (202) 707-5092 (voice) (202) 707-0115 (FAX) r...@loc.gov >>> DC-RDA automatic digest system <lists...@jiscmail.ac.uk> 3/16/2009 >>> 8:05 PM >>> Date: Mon, 16 Mar 2009 10:59:24 -0700 From: Karen Coyle <kco...@kcoyle.net> Subject: MARC and Unicode normalization forms Alistair had a large number of error messages about character set=20 problems when he processed records from MARC through various steps into R= DF: WARN [main] (RDFDefaultErrorHandler.java:36) - file:data/mods/part01-split16.mods.xml.rdf(line 249403 column 117): {W131= } String not in Unicode Normal Form C: "Muse=CC=81e bibliographique" WARN [main] (RDFDefaultErrorHandler.java:36) - file:data/mods/part01-split16.mods.xml.rdf(line 249340 column 184): {W131= } String not in Unicode Normal Form C: "Versuch einer kurzen Geschichte der ro=CC=88misch-catholischen deutschen Bibelu=CC=88bersetzung" While I can't explain why these particular examples get the error (and I=20 will keep looking at it), I have some evidence that the MARC -> MARCXML=20 program does not output Unicode Normal Form C. This causes display=20 problems for some characters (although not, as far as I know, the ones=20 in the examples). It is possible to translate the data into Form C if=20 needed. In any case, it looks like it isn't something that Alistair introduced=20 with his code. If I can figure out for sure that it's a MARCXML issue,=20 I'll suggest that code should be modified. kc --=20 ----------------------------------- Karen Coyle / Digital Library Consultant kco...@kcoyle.net http://www.kcoyle.net ph.: 510-540-7596 skype: kcoylenet fx.: 510-848-3913 mo.: 510-435-8234 ------------------------------------