The only language that I know of with a library for reading Marc8 and converting to another encoding (such as UTF-8) is Java. The Marc4J package will do it.

I suppose there may be C libraries too; is yaz written in C?

As Michael suggests the easiest thing to do (if you're not in Java) is probably to use the 'yaz' tools to convert to UTF-8 before anything else touches it.

If you do end up writing a Marc8 handling library in another language like Perl (presumably you could use the Java code in Marc4J as a guide), please do share! Heh.

On 10/24/2011 2:34 PM, Doran, Michael D wrote:
Hi Eric,

In Perl, how do I specify MARC-8 when reading (decoding) and writing
(encoding) data?
You can't.  MARC-8 is a character set that is unknown to the operating system.  
Your best bet is to convert MARC-8-encoded records into UTF-8.

...it is converted it Perl's
internal encoding (UTF-8)
As an FTY, UTF-8 is *not* Perl's internal encoding.

-- Michael

# Michael Doran, Systems Librarian
# University of Texas at Arlington
# 817-272-5326 office
# 817-688-1926 mobile
# do...@uta.edu
# http://rocky.uta.edu/doran/



-----Original Message-----
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Eric
Lease Morgan
Sent: Monday, October 24, 2011 1:18 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: [CODE4LIB] marc-8

In Perl, how do I specify MARC-8 when reading (decoding) and writing
(encoding) data?

Character encoding is the bane of my existence. I have learned that when
reading from a file I ought to specify the type of encoding the file is in
and decode accordingly, or else. Once read, it is converted it Perl's
internal encoding (UTF-8) and can be manipulated. Similarly, when writing I
am expected to specify the encoding. Both the reading (decoding) and the
writing (encoding) can be done with the Encode module. Here is a some code
illustrating what I'm trying to do with MARC records which are apparently in
MARC-8:

   # require
   use Encode qw( encode decode );

   # initialize
   my $batch = MARC::Batch->new( 'USMARC', './records.mrc' );
   open OUT, '>  updated.mrc';

   # process each record
   while ( my $marc = $batch->next ) {

     # get the title
     my $_245 = decode( 'FOO', $marc->title );

     # do cool stuff with the title here

     # output the cool stuff
     print OUT encode( 'FOO', $_245 );

   }

   # done
   close OUT;
   exit;


My problem is, I don't know what to put in place of FOO. What is the official
name of MARC-8's encoding scheme?

--
Eric "The Ugly American" Morgan
University of Notre Dame

(574) 631-8604

Reply via email to