Re: [CODE4LIB] marc21 and usmarc

Tod Olson Fri, 23 Jan 2009 04:15:30 -0800

On Jan 23, 2009, at 5:52 AM, Eric Lease Morgan wrote:

On 1/23/09 4:39 AM, "Brown, Alan" <[email protected]> wrote:
Does anybody here know the difference between MARC21 and USMARC?

I am munging sets of MARC bibliographic data from a III catalog with
holdings data from the same. I am using MARC::Batch to read my bib'
data (with both strict and warnings turned off), insert 853 and 863
fields, and writing the data using the as_usmarc method.Therefore, I
think I am creating USMARC files. I can then use marcdump to... dump
the records. It returns 0 errors.
Eric, This isn't an encoding thing is it? I know that a number of III
catalogues still encode their diacritics using the MARC8 version of
USMARC. We have changed ours to Unicode now, but we did have anissue ofthe catalogue outputting unicode records that weren't tagged assuch in
the leader and so couldn't be identified as proper MARC21 (current
version of USMARC). III have solved this with their latest release.Thisissue had me scratching my head with a lot of my MARC::Recordscripts,
but generally they failed quite spectacularly.
Actually, I believe I am suffering from a number of different types of
errors in my MARC data: 1) encoding issues (MARC8 versus UTF-8), 2)
syntactical errors (lack of periods, invalid choices of indicators,etc.),3) incorrect data types (strings entered into fields denoted forintegers,etc.) Just about the only thing I haven't encountered are structuralerrorssuch as invalid leader, and this doesn't even take into accountpossible
data entry errors (author is Franklin when Twain was entered).
Yes, I do have an encoding issue. All of my incoming records are inMARC8.I'm not sure, but I think the Primo tool expects UTF-8. I can easilyupdatethe encoding bit (change leader position 09 from blank to a), butthis doesnot change any actual encoding in the bibliographic section of mydata.Consequently, after updating the encoding bit and looping through mymungeddata MARC::Record chokes on records with the following error whereUTF-8 is
denoted but include MARC8 characters:

 utf8 "\xE8" does not map to Unicode at
 /usr/lib/perl5/5.8.8/i686-linux/Encode.pm line 166.
Upon looking at the raw MARC see the the offending record includesthe wordMünich. What can I do to transform MARC8 data into UTF-8? What can Ido to
trap the error above, and skip these invalid records?

We've had good luck with the yaz-marcdump utility that's included withthe YAZ toolkit. We're using it to convert our exported Horizonrecords from MARC8 to UTF-8 before we import into AquaBrowser. Thetool is easy to compile, blindingly fast, forgiving of common MARCerrors, and changes the coding correctly. It's been serving us well.


-Tod

Tod Olson <[email protected]>
Systems Librarian
University of Chicago Library

Re: [CODE4LIB] marc21 and usmarc

Reply via email to