Re: Marc::XML with MARC21

2010-01-26 Thread Michele Pinassi
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Dear Ed,
yes, it works as expected ! I've just tried again with my marc.xml (as
attached) and seems to be encoding problems. Maybe Aleph don't export in
UTF-8 ?

Thanks for your help,
Michele

Ed Summers ha scritto:
 Hi Michele:
 
 I copied and pasted the XML from your email and ran it through a
 simple test script (both attached) and the record seemed to be parsed
 ok. What do you see if you run the attached test.pl?
 
 //Ed
 


- --
|| Michele Pinassi
|| System Manager Area Sistema Biblioteche - UniSi
|| https://sites.google.com/a/unisi.it/o-zone/
|| Assistenza: +39.577.232299 (int. 2299)
|| Personale: +39.577.232477 (int. 2477)
|| FAX: +39.577.232430 (int. 2430)
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkteqLwACgkQFPw35TwkuY5f3QCeIjh80sQHCVl4u39gJreI13Dr
lhAAnAhiR/Cs93aROB8EdImVx6k09NTA
=jIAj
-END PGP SIGNATURE-
marc:record xmlns:marc=http://www.loc.gov/MARC21/slim; xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance; xsi:schemaLocation=http://www.loc.gov/MARC21/slim http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd;marc:leader^cam^^22^^i^4500/marc:leadermarc:controlfield tag=001000762662/marc:controlfieldmarc:datafield tag=020 ind1=  ind2= marc:subfield code=a8814075913/marc:subfield/marc:datafieldmarc:datafield tag=040 ind1=  ind2= marc:subfield code=aIT/marc:subfieldmarc:subfield code=-Servizio Bibliotecario Senese/marc:subfieldmarc:subfield code=eRICA/marc:subfield/marc:datafieldmarc:datafield tag=300 ind1=  ind2= marc:subfield code=aVI, 262 p. ;/marc:subfieldmarc:subfield code=c24 cm/marc:subfield/marc:datafieldmarc:datafield tag=653 ind1=0 ind2= marc:subfield code=aNavigazione da diporto/marc:subfieldmarc:subfield code=aLegislazione/marc:subfield/marc:datafieldmarc:datafield tag=700 ind1=1 ind2= marc:subfield code=aAntonini,Alfredo/marc:subfield/marc:datafieldmarc:datafield tag=700 ind1=1 ind2= marc:subfield code=aMorandi,Francesco/marc:subfield/marc:datafieldmarc:datafield tag=041 ind1=0 ind2= marc:subfield code=aita/marc:subfield/marc:datafieldmarc:datafield tag=245 ind1=1 ind2=0marc:subfield code=aLa navigazione da diporto :/marc:subfieldmarc:subfield code=ble infrastrutture, l' organizzazione, i contratti e le responsabilità :/marc:subfieldmarc:subfield code=batti del convegno, Trieste, 27 marzo 1998 //marc:subfieldmarc:subfield code=ca cura di Alfredo Antonini e Francesco Morandi/marc:subfield/marc:datafieldmarc:datafield tag=260 ind1=  ind2= marc:subfield code=aMilano :/marc:subfieldmarc:subfield code=bGiuffrè/marc:subfieldmarc:subfield code=c1999/marc:subfield/marc:datafieldmarc:datafield tag=490 ind1=  ind2=0marc:subfield code=aCollana del Dipartimento di scienze giuridiche e della Facoltà di giurisprudenza dell' Università di Modena e Reggio Emilia/marc:subfieldmarc:subfield code=pNuova serie ;/marc:subfieldmarc:subfield code=v0048/marc:subfield/marc:datafieldmarc:datafield tag=760 ind1=1 ind2= marc:subfield code=tCollana del Dipartimento di scienze giuridiche e della Facoltà di giurisprudenza dell' Università di Modena e Reggio Emilia/marc:subfieldmarc:subfield code=g0048/marc:subfield/marc:datafieldmarc:datafield tag=082 ind1=  ind2= marc:subfield code=a343.45096/marc:subfieldmarc:subfield code=220/marc:subfield/marc:datafieldmarc:controlfield tag=008^^sxx^|r^|||/marc:controlfield/marc:record

Re: Marc::XML with MARC21

2010-01-26 Thread Ed Summers
Hi Michele:

Yes, I see a UTF-8 encoding error in that file when I try to check it
with xmllint (from the libxml2 package):

e...@curry:~/Downloads$ xmllint marc.xml
marc.xml:1: parser error : Input is not proper UTF-8, indicate encoding !
Bytes: 0xE0 0x20 0x3A 0x3C
ld code=ble infrastrutture, l' organizzazione, i contratti e le responsabilit

This causes MARC::Record-new_from_xml to blow up too, with a somewhat
unhelpful error:

not well-formed (invalid token) at line 1, column 1533, byte 1533 at
/usr/lib/perl5/XML/Parser.pm line 187

It looks like your xml file might be in ISO-8859-1 (at least the unix
file command told me):

e...@curry:~/Projects/marc-xml$ file marc.xml
marc.xml: ISO-8859 text, with very long lines, with no line terminators

So you could try to convert your XML string with Encode before handing
it off to MARC::Record-new_from_xml:

  use Encode;
  Encode-from_to($xml, 'iso-8859-1', 'utf-8');

I attached the full script which seems to work OK. Note, if you are on
ubuntu it looks like they are a few versions back on their
libmarc-xml-perl package (v0.88) instead of the latest on CPAN (v0.92)
... and v0.88 doesn't handle namespaces properly...

//Ed


Re: Marc::XML with MARC21

2010-01-25 Thread Jon Gorman

 my $file = MARC::Record-new_from_xml($marc-serialize(),UTF-8,MARC21);
        $epdata = $plugin-EPrints::Plugin::Import::MARC::convert_input(
 $file );

 and here come troubles: only few metadatas will be interpreted
 correctly, losing a lot of datas.

Ummm, so what metdata makes it through?  I see examples of what you
feed it, but not what is coming out.  Just from looking quickly at the
MarcXML the only thing that seems really weird right away is the
trailing 008 for the control field for the leader.  Don't know what
the xsd states about the ordering, but typically all the controlfields
are at the top of a MARC record.

Jon Gorman


Re: Marc::XML with MARC21

2010-01-25 Thread Ed Summers
Hi Michele:

I copied and pasted the XML from your email and ran it through a
simple test script (both attached) and the record seemed to be parsed
ok. What do you see if you run the attached test.pl?

//Ed


test.pl
Description: Binary data
marc:record xmlns:marc=http://www.loc.gov/MARC21/slim;
xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance;
xsi:schemaLocation=http://www.loc.gov/MARC21/slim
http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd;marc:leader^cam^^22^^i^4500/marc:leadermarc:controlfield
tag=001000762662/marc:controlfieldmarc:datafield tag=020 ind1=
 ind2= marc:subfield
code=a8814075913/marc:subfield/marc:datafieldmarc:datafield
tag=040 ind1=  ind2= marc:subfield
code=aIT/marc:subfieldmarc:subfield code=-Servizio
Bibliotecario Senese/marc:subfieldmarc:subfield
code=eRICA/marc:subfield/marc:datafieldmarc:datafield
tag=300
ind1=  ind2= marc:subfield code=aVI, 262 p.
;/marc:subfieldmarc:subfield code=c24
cm/marc:subfield/marc:datafieldmarc:datafield tag=653
ind1=0
ind2= marc:subfield code=aNavigazione da
diporto/marc:subfieldmarc:subfield
code=aLegislazione/marc:subfield/marc:datafieldmarc:datafield
tag=700 ind1=1 ind2= marc:subfield
code=aAntonini,Alfredo/marc:subfield/marc:datafieldmarc:datafield
tag=700 ind1=1 ind2= marc:subfield
code=aMorandi,Francesco/marc:subfield/marc:datafieldmarc:datafield
tag=041 ind1=0 ind2= marc:subfield
code=aita/marc:subfield/marc:datafieldmarc:datafield
tag=245
ind1=1 ind2=0marc:subfield code=aLa navigazione da diporto
:/marc:subfieldmarc:subfield code=ble infrastrutture, l'
organizzazione, i contratti e le responsabilità
:/marc:subfieldmarc:subfield code=batti del convegno, Trieste,
27
marzo 1998 //marc:subfieldmarc:subfield code=ca cura di
Alfredo
Antonini e Francesco
Morandi/marc:subfield/marc:datafieldmarc:datafield tag=260
ind1=
 ind2= marc:subfield code=aMilano
:/marc:subfieldmarc:subfield
code=bGiuffrè/marc:subfieldmarc:subfield
code=c1999/marc:subfield/marc:datafieldmarc:datafield
tag=490
ind1=  ind2=0marc:subfield code=aCollana del Dipartimento di
scienze giuridiche e della Facoltà di giurisprudenza dell'
Università di
Modena e Reggio Emilia/marc:subfieldmarc:subfield code=pNuova
serie ;/marc:subfieldmarc:subfield
code=v0048/marc:subfield/marc:datafieldmarc:datafield
tag=760
ind1=1 ind2= marc:subfield code=tCollana del Dipartimento di
scienze giuridiche e della Facoltà di giurisprudenza dell'
Università di
Modena e Reggio Emilia/marc:subfieldmarc:subfield
code=g0048/marc:subfield/marc:datafieldmarc:datafield
tag=082
ind1=  ind2= marc:subfield
code=a343.45096/marc:subfieldmarc:subfield
code=220/marc:subfield/marc:datafieldmarc:controlfield
tag=008^^sxx^|r^|||/marc:controlfield/marc:record