Eric,

Have you tried checking how MARC::Batch views the encoding?

e.g.

# read & write
while ( my $marc = $batch->next ) { print $marc->encoding(); print 
$marc->as_usmarc; }

It is supposed to pick up the encoding from 09 in the leader but I am not sure 
this is totally reliable. If you know this is definitely a utf8 file you can 
mannually set the encoding (but you shouldn't have to).

e.g.

# read & write
  while ( my $marc = $batch->next ) { $marc->encoding('UTF-8'); print 
$marc->as_usmarc; }

regards

Alan

--  
Alan Brown
Library Systems Liaison Officer
Bury Library Service
Resource Services
Textile Hall
Manchester Rd
Bury BL9 0DG
0161 253 5877
http://www.bury.gov.uk/libraries
http://library.bury.gov.uk




-----Original Message-----
From: Eric Lease Morgan [mailto:emor...@nd.edu] 
Sent: 26 March 2013 20:22
To: perl4lib@perl.org
Subject: reading and writing of utf-8 with marc::batch


For the life of me I can't figure out how to do reading and writing of UTF-8 
with MARC::Batch.

I have a UTF-8 encoded file of MARC records. Dumping the records and greping 
for a particular string illustrates the validity:

  $ marcdump und.marc | grep Sainte-Face
  und.marc
  1000 records
  2000 records
  3000 records
  4000 records
  5000 records
  6000 records
  7000 records
  8000 records
  9000 records
  10000 records
  11000 records
  12000 records
  245 00 _aAnnales de l'Archiconfrérie de la Sainte-Face
  610 20 _aArchiconfrérie de la Sainte-Face
  13000 records
  $ 

I then run a Perl script that simply reads each record and dumps it to STDOUT. 
Notice how I define both my input and output as UTF-8:

  #!/shared/perl/current/bin/perl

  # configure
  use constant MARC => './und.marc';

  # require
  use strict;
  use MARC::Batch;

  # initialize
  binmode ( MARC, ":utf8" );
  my $batch = MARC::Batch->new( 'USMARC', MARC );
  $batch->strict_off;
  $batch->warnings_off;
  binmode( STDOUT, ":utf8" );

  # read & write
  while ( my $marc = $batch->next ) { print $marc->as_usmarc }

  # done
  exit;

But my output is munged:

  $ ./marc.pl > und.mrc
  $ marcdump und.mrc | grep Sainte-Face
  und.mrc
  1000 records
  2000 records
  3000 records
  4000 records
  5000 records
  6000 records
  7000 records
  8000 records
  9000 records
  10000 records
  11000 records
  12000 records
  245 00 _aAnnales de l'Archiconfrérie de la Sainte-Face
  610    _aArchiconfrérie de la Sainte-Face
  13000 records
  $

What am I doing wrong!?

--
Eric Lease Morgan
University of Notre Dame

574/631-8604



-----------------------------------------------------------------
Why not visit our website www.bury.gov.uk
-----------------------------------------------------------------
Incoming and outgoing e-mail messages are routinely monitored for compliance
with our information security policy.
The information contained in this e-mail and any files transmitted
with it is for the intended recipient(s) alone. It may contain
confidential information that is exempt from the disclosure under
English law and may also be covered by legal,professional or other privilege.
If you are not the intended recipient, you must not copy, distribute or take any
action in reliance on it. 
If you have received this e-mail in error, please notify us immediately by 
using 
the reply facility on your e-mail system.
If this message is being transmitted over the Internet, be aware that it may be 
intercepted by third parties.
As a public body, the Council may be required to disclose this e-mail or any 
response to it under the Freedom of Information Act 2000 unless the information
in it is covered by one of the exemptions in the Act.  
Electronic service accepted only at legalservi...@bury.gov.uk and on fax number 
0161 253 5119 .
*************************************************************

Reply via email to