Re: Marc::XML with MARC21
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Dear Ed, yes, it works as expected ! I've just tried again with my marc.xml (as attached) and seems to be encoding problems. Maybe Aleph don't export in UTF-8 ? Thanks for your help, Michele Ed Summers ha scritto: Hi Michele: I copied and pasted the XML from your email and ran it through a simple test script (both attached) and the record seemed to be parsed ok. What do you see if you run the attached test.pl? //Ed - -- || Michele Pinassi || System Manager Area Sistema Biblioteche - UniSi || https://sites.google.com/a/unisi.it/o-zone/ || Assistenza: +39.577.232299 (int. 2299) || Personale: +39.577.232477 (int. 2477) || FAX: +39.577.232430 (int. 2430) -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkteqLwACgkQFPw35TwkuY5f3QCeIjh80sQHCVl4u39gJreI13Dr lhAAnAhiR/Cs93aROB8EdImVx6k09NTA =jIAj -END PGP SIGNATURE- marc:record xmlns:marc=http://www.loc.gov/MARC21/slim; xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance; xsi:schemaLocation=http://www.loc.gov/MARC21/slim http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd;marc:leader^cam^^22^^i^4500/marc:leadermarc:controlfield tag=001000762662/marc:controlfieldmarc:datafield tag=020 ind1= ind2= marc:subfield code=a8814075913/marc:subfield/marc:datafieldmarc:datafield tag=040 ind1= ind2= marc:subfield code=aIT/marc:subfieldmarc:subfield code=-Servizio Bibliotecario Senese/marc:subfieldmarc:subfield code=eRICA/marc:subfield/marc:datafieldmarc:datafield tag=300 ind1= ind2= marc:subfield code=aVI, 262 p. ;/marc:subfieldmarc:subfield code=c24 cm/marc:subfield/marc:datafieldmarc:datafield tag=653 ind1=0 ind2= marc:subfield code=aNavigazione da diporto/marc:subfieldmarc:subfield code=aLegislazione/marc:subfield/marc:datafieldmarc:datafield tag=700 ind1=1 ind2= marc:subfield code=aAntonini,Alfredo/marc:subfield/marc:datafieldmarc:datafield tag=700 ind1=1 ind2= marc:subfield code=aMorandi,Francesco/marc:subfield/marc:datafieldmarc:datafield tag=041 ind1=0 ind2= marc:subfield code=aita/marc:subfield/marc:datafieldmarc:datafield tag=245 ind1=1 ind2=0marc:subfield code=aLa navigazione da diporto :/marc:subfieldmarc:subfield code=ble infrastrutture, l' organizzazione, i contratti e le responsabilità :/marc:subfieldmarc:subfield code=batti del convegno, Trieste, 27 marzo 1998 //marc:subfieldmarc:subfield code=ca cura di Alfredo Antonini e Francesco Morandi/marc:subfield/marc:datafieldmarc:datafield tag=260 ind1= ind2= marc:subfield code=aMilano :/marc:subfieldmarc:subfield code=bGiuffrè/marc:subfieldmarc:subfield code=c1999/marc:subfield/marc:datafieldmarc:datafield tag=490 ind1= ind2=0marc:subfield code=aCollana del Dipartimento di scienze giuridiche e della Facoltà di giurisprudenza dell' Università di Modena e Reggio Emilia/marc:subfieldmarc:subfield code=pNuova serie ;/marc:subfieldmarc:subfield code=v0048/marc:subfield/marc:datafieldmarc:datafield tag=760 ind1=1 ind2= marc:subfield code=tCollana del Dipartimento di scienze giuridiche e della Facoltà di giurisprudenza dell' Università di Modena e Reggio Emilia/marc:subfieldmarc:subfield code=g0048/marc:subfield/marc:datafieldmarc:datafield tag=082 ind1= ind2= marc:subfield code=a343.45096/marc:subfieldmarc:subfield code=220/marc:subfield/marc:datafieldmarc:controlfield tag=008^^sxx^|r^|||/marc:controlfield/marc:record
Re: Marc::XML with MARC21
Hi Michele: Yes, I see a UTF-8 encoding error in that file when I try to check it with xmllint (from the libxml2 package): e...@curry:~/Downloads$ xmllint marc.xml marc.xml:1: parser error : Input is not proper UTF-8, indicate encoding ! Bytes: 0xE0 0x20 0x3A 0x3C ld code=ble infrastrutture, l' organizzazione, i contratti e le responsabilit This causes MARC::Record-new_from_xml to blow up too, with a somewhat unhelpful error: not well-formed (invalid token) at line 1, column 1533, byte 1533 at /usr/lib/perl5/XML/Parser.pm line 187 It looks like your xml file might be in ISO-8859-1 (at least the unix file command told me): e...@curry:~/Projects/marc-xml$ file marc.xml marc.xml: ISO-8859 text, with very long lines, with no line terminators So you could try to convert your XML string with Encode before handing it off to MARC::Record-new_from_xml: use Encode; Encode-from_to($xml, 'iso-8859-1', 'utf-8'); I attached the full script which seems to work OK. Note, if you are on ubuntu it looks like they are a few versions back on their libmarc-xml-perl package (v0.88) instead of the latest on CPAN (v0.92) ... and v0.88 doesn't handle namespaces properly... //Ed
Re: Marc::XML with MARC21
my $file = MARC::Record-new_from_xml($marc-serialize(),UTF-8,MARC21); $epdata = $plugin-EPrints::Plugin::Import::MARC::convert_input( $file ); and here come troubles: only few metadatas will be interpreted correctly, losing a lot of datas. Ummm, so what metdata makes it through? I see examples of what you feed it, but not what is coming out. Just from looking quickly at the MarcXML the only thing that seems really weird right away is the trailing 008 for the control field for the leader. Don't know what the xsd states about the ordering, but typically all the controlfields are at the top of a MARC record. Jon Gorman
Re: Marc::XML with MARC21
Hi Michele: I copied and pasted the XML from your email and ran it through a simple test script (both attached) and the record seemed to be parsed ok. What do you see if you run the attached test.pl? //Ed test.pl Description: Binary data marc:record xmlns:marc=http://www.loc.gov/MARC21/slim; xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance; xsi:schemaLocation=http://www.loc.gov/MARC21/slim http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd;marc:leader^cam^^22^^i^4500/marc:leadermarc:controlfield tag=001000762662/marc:controlfieldmarc:datafield tag=020 ind1= ind2= marc:subfield code=a8814075913/marc:subfield/marc:datafieldmarc:datafield tag=040 ind1= ind2= marc:subfield code=aIT/marc:subfieldmarc:subfield code=-Servizio Bibliotecario Senese/marc:subfieldmarc:subfield code=eRICA/marc:subfield/marc:datafieldmarc:datafield tag=300 ind1= ind2= marc:subfield code=aVI, 262 p. ;/marc:subfieldmarc:subfield code=c24 cm/marc:subfield/marc:datafieldmarc:datafield tag=653 ind1=0 ind2= marc:subfield code=aNavigazione da diporto/marc:subfieldmarc:subfield code=aLegislazione/marc:subfield/marc:datafieldmarc:datafield tag=700 ind1=1 ind2= marc:subfield code=aAntonini,Alfredo/marc:subfield/marc:datafieldmarc:datafield tag=700 ind1=1 ind2= marc:subfield code=aMorandi,Francesco/marc:subfield/marc:datafieldmarc:datafield tag=041 ind1=0 ind2= marc:subfield code=aita/marc:subfield/marc:datafieldmarc:datafield tag=245 ind1=1 ind2=0marc:subfield code=aLa navigazione da diporto :/marc:subfieldmarc:subfield code=ble infrastrutture, l' organizzazione, i contratti e le responsabilità :/marc:subfieldmarc:subfield code=batti del convegno, Trieste, 27 marzo 1998 //marc:subfieldmarc:subfield code=ca cura di Alfredo Antonini e Francesco Morandi/marc:subfield/marc:datafieldmarc:datafield tag=260 ind1= ind2= marc:subfield code=aMilano :/marc:subfieldmarc:subfield code=bGiuffrè/marc:subfieldmarc:subfield code=c1999/marc:subfield/marc:datafieldmarc:datafield tag=490 ind1= ind2=0marc:subfield code=aCollana del Dipartimento di scienze giuridiche e della Facoltà di giurisprudenza dell' Università di Modena e Reggio Emilia/marc:subfieldmarc:subfield code=pNuova serie ;/marc:subfieldmarc:subfield code=v0048/marc:subfield/marc:datafieldmarc:datafield tag=760 ind1=1 ind2= marc:subfield code=tCollana del Dipartimento di scienze giuridiche e della Facoltà di giurisprudenza dell' Università di Modena e Reggio Emilia/marc:subfieldmarc:subfield code=g0048/marc:subfield/marc:datafieldmarc:datafield tag=082 ind1= ind2= marc:subfield code=a343.45096/marc:subfieldmarc:subfield code=220/marc:subfield/marc:datafieldmarc:controlfield tag=008^^sxx^|r^|||/marc:controlfield/marc:record