subject:"reading and writing of utf\-8 with marc\:\:batch"

Re: reading and writing of utf-8 with marc::batch [double encoding]

2013-03-28 Thread Ashley Sanders

Eric,

 How can I figure out whether or not a MARC record contains ONLY characters 
 from the UTF-8 character set?

You can use a regex to check if a string is utf-8. There are various examples
floating around the internet. An example is the one here:

   http://www.w3.org/International/questions/qa-forms-utf-8

You'll need to add the MARC control characters ^_, ^^, and ^] to the ASCII part
of the expression in the above page. (I think the w3c example is aimed at XML1.0
in which the MARC control characters are not allowed.)

Ashley.
--
Ashley Sanders a.sand...@manchester.ac.uk
http://copac.ac.uk -- A Mimas service funded by JISC at the University of 
Manchester

Re: reading and writing of utf-8 with marc::batch [resolved; gigo]

2013-03-28 Thread Eric Lease Morgan


Thank you for all the input, and I think I have resolved my particular issue. 
Battle won. War still raging.

Using the script suggested by Galen as an starting point, I wrote the following 
hack outputting integers denoting MARC records containing non-UTF-8 characters, 
but the script output nothing; all the data in all of my records was encoded as 
UTF-8:

  #!/usr/bin/perl

  # require
  use strict;
  use Encode;

  # initialize
  binmode STDIN, :bytes;
  $/= \035; 
  my $i = 0;

  # read STDIN
  while (  ) {

  # increment
  $i++;

  # check validity
  eval { my $utf8str = Encode::is_utf8( $_, Encode::FB_CROAK ); };

  # check for error
  if ( $@ ) { print Record $i contains non-UTF-8 characters\n; }

  }

  # done
  exit;


Since all of the data in all of my records was UTF-8, then all of the leaders 
of all of the records need to have a value of a set in position #9 of the 
leader. So I wrote the following hack (circumventing MARC::Batch):

  #!/usr/bin/perl

  # require
  use strict;

  # initialize
  binmode STDIN,  :bytes;
  binmode STDOUT, :bytes;
  $/ = \035; 

  # loop through the input
  while (  ) {

  # do the work and output
  substr( $_, 9, 1 ) = a;
  print $_;

  }

  # done
  exit;


I then fed the output of my fix routine to my indexing routing, and all of my 
problems seemed to go away. GIGO?

I'm still not sure, but I think deep within MARC::Batch some sort of encoding 
is observed, honored, and output. And when the denoted encoding is not true and 
things like binmode( FILE, :utf8 ) get called, output gets munged. Again, I'm 
not sure. It is almost exhausting.


-- 
Eric Morgan
University of Notre Dame

FW: reading and writing of utf-8 with marc::batch

2013-03-27 Thread Brown, Alan

Eric,

Have you tried checking how MARC::Batch views the encoding?

e.g.

# read  write
while ( my $marc = $batch-next ) { print $marc-encoding(); print 
$marc-as_usmarc; }

It is supposed to pick up the encoding from 09 in the leader but I am not sure 
this is totally reliable. If you know this is definitely a utf8 file you can 
mannually set the encoding (but you shouldn't have to).

e.g.

# read  write
  while ( my $marc = $batch-next ) { $marc-encoding('UTF-8'); print 
$marc-as_usmarc; }

regards

Alan

--  
Alan Brown
Library Systems Liaison Officer
Bury Library Service
Resource Services
Textile Hall
Manchester Rd
Bury BL9 0DG
0161 253 5877
http://www.bury.gov.uk/libraries
http://library.bury.gov.uk




-Original Message-
From: Eric Lease Morgan [mailto:emor...@nd.edu] 
Sent: 26 March 2013 20:22
To: perl4lib@perl.org
Subject: reading and writing of utf-8 with marc::batch


For the life of me I can't figure out how to do reading and writing of UTF-8 
with MARC::Batch.

I have a UTF-8 encoded file of MARC records. Dumping the records and greping 
for a particular string illustrates the validity:

  $ marcdump und.marc | grep Sainte-Face
  und.marc
  1000 records
  2000 records
  3000 records
  4000 records
  5000 records
  6000 records
  7000 records
  8000 records
  9000 records
  1 records
  11000 records
  12000 records
  245 00 _aAnnales de l'Archiconfrérie de la Sainte-Face
  610 20 _aArchiconfrérie de la Sainte-Face
  13000 records
  $ 

I then run a Perl script that simply reads each record and dumps it to STDOUT. 
Notice how I define both my input and output as UTF-8:

  #!/shared/perl/current/bin/perl

  # configure
  use constant MARC = './und.marc';

  # require
  use strict;
  use MARC::Batch;

  # initialize
  binmode ( MARC, :utf8 );
  my $batch = MARC::Batch-new( 'USMARC', MARC );
  $batch-strict_off;
  $batch-warnings_off;
  binmode( STDOUT, :utf8 );

  # read  write
  while ( my $marc = $batch-next ) { print $marc-as_usmarc }

  # done
  exit;

But my output is munged:

  $ ./marc.pl  und.mrc
  $ marcdump und.mrc | grep Sainte-Face
  und.mrc
  1000 records
  2000 records
  3000 records
  4000 records
  5000 records
  6000 records
  7000 records
  8000 records
  9000 records
  1 records
  11000 records
  12000 records
  245 00 _aAnnales de l'ArchiconfrÃ©rie de la Sainte-Face
  610_aArchiconfrÃ©rie de la Sainte-Face
  13000 records
  $

What am I doing wrong!?

--
Eric Lease Morgan
University of Notre Dame

574/631-8604



-
Why not visit our website www.bury.gov.uk
-
Incoming and outgoing e-mail messages are routinely monitored for compliance
with our information security policy.
The information contained in this e-mail and any files transmitted
with it is for the intended recipient(s) alone. It may contain
confidential information that is exempt from the disclosure under
English law and may also be covered by legal,professional or other privilege.
If you are not the intended recipient, you must not copy, distribute or take any
action in reliance on it. 
If you have received this e-mail in error, please notify us immediately by 
using 
the reply facility on your e-mail system.
If this message is being transmitted over the Internet, be aware that it may be 
intercepted by third parties.
As a public body, the Council may be required to disclose this e-mail or any 
response to it under the Freedom of Information Act 2000 unless the information
in it is covered by one of the exemptions in the Act.  
Electronic service accepted only at legalservi...@bury.gov.uk and on fax number 
0161 253 5119 .
*

RE: reading and writing of utf-8 with marc::batch

2013-03-27 Thread KREYCHE, MICHAEL

Eric--

I'm with Leif. The output you got looks like utf-8 displayed on a terminal that 
doesn't support it. Whether you need to fix the terminal display is another 
matter--I've never felt compelled to do so. 

Anyway, I think you can now sign yourself Eric Did-it-right-the-first-time 
Morgan!

Mike

 -Original Message-
 From: Leif Andersson [mailto:leif.anders...@sub.su.se]
 Sent: Tuesday, March 26, 2013 5:57 PM
 To: Eric Lease Morgan; perl4lib@perl.org
 Subject: Re: reading and writing of utf-8 with marc::batch
 
 Hi Eric,
 
 my first guess would be your terminal is not utf8.
 If you comment out
 #binmode( STDOUT, :utf8 );
 and that does the trick, then you can start looking for how to change
 your terminal settings.
 (And that can sometimes be a rather frustrating task, I'm afraid)
 
 /Leif Andersson
 Stockholm UL
 
 Från: Eric Lease Morgan [emor...@nd.edu]
 Skickat: den 26 mars 2013 21:22
 Till: perl4lib@perl.org
 Ämne: reading and writing of utf-8 with marc::batch
 
 For the life of me I can't figure out how to do reading and writing of
 UTF-8 with MARC::Batch.
 
 I have a UTF-8 encoded file of MARC records. Dumping the records and
 greping for a particular string illustrates the validity:
 
   $ marcdump und.marc | grep Sainte-Face
   und.marc
   1000 records
   2000 records
   3000 records
   4000 records
   5000 records
   6000 records
   7000 records
   8000 records
   9000 records
   1 records
   11000 records
   12000 records
   245 00 _aAnnales de l'Archiconfrérie de la Sainte-Face
   610 20 _aArchiconfrérie de la Sainte-Face
   13000 records
   $
 
 I then run a Perl script that simply reads each record and dumps it to
 STDOUT. Notice how I define both my input and output as UTF-8:
 
   #!/shared/perl/current/bin/perl
 
   # configure
   use constant MARC = './und.marc';
 
   # require
   use strict;
   use MARC::Batch;
 
   # initialize
   binmode ( MARC, :utf8 );
   my $batch = MARC::Batch-new( 'USMARC', MARC );
   $batch-strict_off;
   $batch-warnings_off;
   binmode( STDOUT, :utf8 );
 
   # read  write
   while ( my $marc = $batch-next ) { print $marc-as_usmarc }
 
   # done
   exit;
 
 But my output is munged:
 
   $ ./marc.pl  und.mrc
   $ marcdump und.mrc | grep Sainte-Face
   und.mrc
   1000 records
   2000 records
   3000 records
   4000 records
   5000 records
   6000 records
   7000 records
   8000 records
   9000 records
   1 records
   11000 records
   12000 records
   245 00 _aAnnales de l'ArchiconfrÃ©rie de la Sainte-Face
   610_aArchiconfrÃ©rie de la Sainte-Face
   13000 records
   $
 
 What am I doing wrong!?
 
 --
 Eric Lease Morgan
 University of Notre Dame
 
 574/631-8604

Re: reading and writing of utf-8 with marc::batch

2013-03-27 Thread Jon Gorman

Ok, I can't claim to be an expert, but from my own experience, I'd say
Paul is very likely right about double-encoding occuring.  However,
the question ends up being where that happens, and in this case I
suspect how MARC::Batch will work could depend heavily on what version
of perl you're running and what version of MARC::Batch you're running.
That might help too (I'd try to be on a later version of perl, the
latest of Batch::MARC ). (It also depends on how you're generating the
marc record, which isn't really clear to me.

It could also be that the leaders or the terminal as others have suggested.

One piece of advice is not to trust the terminal directly but pipe
into xxd. (And if possible, just try transforming the offending
record).  Or use yaz-marcdump -v, which will also give the hex if I
remember correctly.  (If it's c3 a9 in both cases, you know the
terminal is at fault)

Then try doing that without the binmode, w/ binmode :raw, etc.

Jon Gorman

Re: reading and writing of utf-8 with marc::batch

2013-03-27 Thread Galen Charlton

Hi,

On Wed, Mar 27, 2013 at 7:01 AM, Jon Gorman jonathan.gor...@gmail.comwrote:

 One piece of advice is not to trust the terminal directly but pipe
 into xxd. (And if possible, just try transforming the offending
 record).  Or use yaz-marcdump -v, which will also give the hex if I
 remember correctly.  (If it's c3 a9 in both cases, you know the
 terminal is at fault)


Another trick is to pipe the output through less with the LESSCHARSET
environment variable set to 'ascii'.  Bytes whose value is less than 32 or
greater than 136 will be displayed as reverse-video hexadecimal numbers,
e.g.,

subfield code=aGarciCC81a MaCC81rquez, Gabriel,/subfield

Regards,

Galen
-- 
Galen Charlton
gmcha...@gmail.com

Re: reading and writing of utf-8 with marc::batch [terminal]

2013-03-27 Thread Eric Lease Morgan


On Mar 26, 2013, at 5:57 PM, Leif Andersson leif.anders...@sub.su.se wrote:

 my first guess would be your terminal is not utf8.

While I'm not positive my terminal is doing UTF-8, I think it is. When I dump 
in the beginning the output to the terminal is correct. After I run my script 
the output to the same terminal is incorrect. 

--
Eric Lease Morgan

Re: reading and writing of utf-8 with marc::batch [terminal]

2013-03-27 Thread Galen Charlton

Hi Eric,

On Wed, Mar 27, 2013 at 10:26 AM, Eric Lease Morgan emor...@nd.edu wrote:

 While I'm not positive my terminal is doing UTF-8, I think it is. When I
 dump in the beginning the output to the terminal is correct. After I run my
 script the output to the same terminal is incorrect.


Would you be willing to put up a link to your MARC file?  I'm willing to
take a quick look to see if I can reproduce the problem you're seeing.

Regards,

Galen
-- 
Galen Charlton
gmcha...@gmail.com

Re: reading and writing of utf-8 with marc::batch

2013-03-27 Thread Shelley Doljack

Whenever I see characters like Ã©, I consult this website 
http://www.i18nqa.com/debug/bug-utf-8-latin1.html to help me figure out what's 
going on. You might find it helpful too.

Shelley

- Original Message -
 From: Eric Lease Morgan emor...@nd.edu
 To: perl4lib@perl.org
 Sent: Tuesday, March 26, 2013 1:22:03 PM
 Subject: reading and writing of utf-8 with marc::batch
 
 
 For the life of me I can't figure out how to do reading and writing
 of UTF-8 with MARC::Batch.
 
 I have a UTF-8 encoded file of MARC records. Dumping the records and
 greping for a particular string illustrates the validity:
 
   $ marcdump und.marc | grep Sainte-Face
   und.marc
   1000 records
   2000 records
   3000 records
   4000 records
   5000 records
   6000 records
   7000 records
   8000 records
   9000 records
   1 records
   11000 records
   12000 records
   245 00 _aAnnales de l'Archiconfrérie de la Sainte-Face
   610 20 _aArchiconfrérie de la Sainte-Face
   13000 records
   $
 
 I then run a Perl script that simply reads each record and dumps it
 to STDOUT. Notice how I define both my input and output as UTF-8:
 
   #!/shared/perl/current/bin/perl
 
   # configure
   use constant MARC = './und.marc';
 
   # require
   use strict;
   use MARC::Batch;
 
   # initialize
   binmode ( MARC, :utf8 );
   my $batch = MARC::Batch-new( 'USMARC', MARC );
   $batch-strict_off;
   $batch-warnings_off;
   binmode( STDOUT, :utf8 );
 
   # read  write
   while ( my $marc = $batch-next ) { print $marc-as_usmarc }
 
   # done
   exit;
 
 But my output is munged:
 
   $ ./marc.pl  und.mrc
   $ marcdump und.mrc | grep Sainte-Face
   und.mrc
   1000 records
   2000 records
   3000 records
   4000 records
   5000 records
   6000 records
   7000 records
   8000 records
   9000 records
   1 records
   11000 records
   12000 records
   245 00 _aAnnales de l'ArchiconfrÃ©rie de la Sainte-Face
   610_aArchiconfrÃ©rie de la Sainte-Face
   13000 records
   $
 
 What am I doing wrong!?
 
 --
 Eric Lease Morgan
 University of Notre Dame
 
 574/631-8604
 
 
 
 

-- 
Shelley Doljack  
E-Resources Metadata Librarian 
Metadata Department
Stanford University Libraries
sdolj...@stanford.edu
650-725-0167

Re: reading and writing of utf-8 with marc::batch [double encoding]

2013-03-27 Thread Eric Lease Morgan


A number of people have alluded to the problem of double encoding, and I'm 
beginning to think this is true. 

I have isolated a number of problem records. They all contain diacritics, but 
they do not have an a in position #9 of the leader -- 
http://dh.crc.nd.edu/tmp/original.marc  Can someone verify that the file 
contains UTF-8 characters for me?

For these same records I have also added an a in position #9 and created a 
similar file -- http://dh.crc.nd.edu/tmp/fixed.marc  

Is it true that original.marc is not denoted correctly, but fixed.marc is 
denoted correctly?

-- 
Eric Morgan

Re: reading and writing of utf-8 with marc::batch [double encoding]

2013-03-27 Thread Galen Charlton

Hi,

On Wed, Mar 27, 2013 at 11:20 AM, Eric Lease Morgan emor...@nd.edu wrote:

 I have isolated a number of problem records. They all contain diacritics,
 but they do not have an a in position #9 of the leader --
 http://dh.crc.nd.edu/tmp/original.marc  Can someone verify that the file
 contains UTF-8 characters for me?


I've eyeballed it and confirm that the encoding of that file is UTF-8.

For these same records I have also added an a in position #9 and created
 a similar file -- http://dh.crc.nd.edu/tmp/fixed.marc


I've looked this over as well.


 Is it true that original.marc is not denoted correctly, but fixed.marc is
 denoted correctly?


Yes.  The Leader/09 must be set to 'a' if the character encoding in use is
UTF-8.

Regards,

Galen
-- 
Galen Charlton
gmcha...@gmail.com

Re: reading and writing of utf-8 with marc::batch [double encoding]

2013-03-27 Thread Eric Lease Morgan


On Mar 27, 2013, at 4:59 PM, Eric Lease Morgan emor...@nd.edu wrote:

 When it calls as_usmarc, I think MARC::Batch tries to honor the value set in 
 position #9 of the leader. In other words, if the leader is empty, then it 
 tries to output records as MARC-8, and when the leader is a value of a, it 
 tries to encode the data as UTF-8.

How can I figure out whether or not a MARC record contains ONLY characters from 
the UTF-8 character set?

Put another way, how can I determine whether or not position #9 of a given MARC 
leader is accurate? If position #9 is an a, then how can I read the balance 
of the record to determine whether or not all the characters really and truly 
are UTF-8 encoded?

--
Eric This Is Almost Too Much For Me Morgan

Re: reading and writing of utf-8 with marc::batch [double encoding]

2013-03-27 Thread Shelley Doljack

I use MarcEdit to view records and check if the mnemonic form of a diacritic 
(e.g. {eacute}) appears or not and what the LDR/09 value is. That's the best 
way I've come up with so far. MarcEdit is pretty good at guessing what the 
character encoding is without relying on the LDR/09 value. I think there are 
some perl modules you could use that guess what the encoding is of a 
character but I've never used them. I'm interested in finding out other methods 
(preferably automated) for detecting wrong or mixed character encodings in a 
MARC record. 

Shelley

- Original Message -
 From: Eric Lease Morgan emor...@nd.edu
 To: perl4lib@perl.org
 Sent: Wednesday, March 27, 2013 2:11:26 PM
 Subject: Re: reading and writing of utf-8 with marc::batch [double encoding]
 
 
 On Mar 27, 2013, at 4:59 PM, Eric Lease Morgan emor...@nd.edu
 wrote:
 
  When it calls as_usmarc, I think MARC::Batch tries to honor the
  value set in position #9 of the leader. In other words, if the
  leader is empty, then it tries to output records as MARC-8, and
  when the leader is a value of a, it tries to encode the data as
  UTF-8.
 
 How can I figure out whether or not a MARC record contains ONLY
 characters from the UTF-8 character set?
 
 Put another way, how can I determine whether or not position #9 of a
 given MARC leader is accurate? If position #9 is an a, then how
 can I read the balance of the record to determine whether or not all
 the characters really and truly are UTF-8 encoded?
 
 --
 Eric This Is Almost Too Much For Me Morgan

Re: reading and writing of utf-8 with marc::batch [double encoding]

2013-03-27 Thread Galen Charlton

Hi,


On Wed, Mar 27, 2013 at 2:11 PM, Eric Lease Morgan emor...@nd.edu wrote:

 Put another way, how can I determine whether or not position #9 of a given
 MARC leader is accurate? If position #9 is an a, then how can I read the
 balance of the record to determine whether or not all the characters really
 and truly are UTF-8 encoded?


The following program will read a file of MARC records from standard input
and classify each as either being valid UTF-8 or not.

___START
#!/usr/bin/perl

use Encode;

binmode STDIN, ':bytes';

$/ = \035; # MARC record terminator
my $i = 0;
while () {
$i++;
my $bytes = $_;
eval {
my $utf8str = Encode::decode('UTF-8', $bytes, Encode::FB_CROAK);
};
if ($@) {
print Record $i is valid UTF-8\n;
} else {
print Record $i definitely not valid UTF-8\n;
}
}
___END

Regards,

Galen
-- 
Galen Charlton
gmcha...@gmail.com

Re: reading and writing of utf-8 with marc::batch

2013-03-26 Thread Paul Hoffman

On Tue, Mar 26, 2013 at 04:22:03PM -0400, Eric Lease Morgan wrote:
 For the life of me I can't figure out how to do reading and writing of 
 UTF-8 with MARC::Batch.
 
 I have a UTF-8 encoded file of MARC records. Dumping the records and 
 greping for a particular string illustrates the validity:
 
   $ marcdump und.marc | grep Sainte-Face

What is marcdump?

   245 00 _aAnnales de l'Archiconfrérie de la Sainte-Face
   610 20 _aArchiconfrérie de la Sainte-Face
   13000 records
   $ 
 
 I then run a Perl script that simply reads each record and dumps it to 
 STDOUT. Notice how I define both my input and output as UTF-8:

Try *not* calling binmode and see what happens.  Or just call 
binmode(MARC) without the ':utf8' layer.

   245 00 _aAnnales de l'ArchiconfrÃ©rie de la Sainte-Face
   610_aArchiconfrÃ©rie de la Sainte-Face
   13000 records
   $

This looks like double-encoding:

  6c 27 41 72 63 68 69 63  6f 6e 66 72 c3 83 c2 a9  |l'ArchiconfrÃ.©|
0010  72 69 65  |rie|

LATIN SMALL LETTER E WITH ACUTE is supposed to be c3 a9 (as it is in the 
first marcdump output) not c3 83 c2 a9.

Paul.

-- 
Paul Hoffman nkui...@nkuitse.com

Re: reading and writing of utf-8 with marc::batch

2013-03-26 Thread Timothy Prettyman

Do your records have the utf8 encoding byte set  in the LDR? (Byte 9 should
be 'a' for utf8).

-Tim

Timothy Prettyman
University of Michigan LIbrary/LIT


On Tue, Mar 26, 2013 at 4:22 PM, Eric Lease Morgan emor...@nd.edu wrote:


 For the life of me I can't figure out how to do reading and writing of
 UTF-8 with MARC::Batch.

 I have a UTF-8 encoded file of MARC records. Dumping the records and
 greping for a particular string illustrates the validity:

   $ marcdump und.marc | grep Sainte-Face
   und.marc
   1000 records
   2000 records
   3000 records
   4000 records
   5000 records
   6000 records
   7000 records
   8000 records
   9000 records
   1 records
   11000 records
   12000 records
   245 00 _aAnnales de l'Archiconfrérie de la Sainte-Face
   610 20 _aArchiconfrérie de la Sainte-Face
   13000 records
   $

 I then run a Perl script that simply reads each record and dumps it to
 STDOUT. Notice how I define both my input and output as UTF-8:

   #!/shared/perl/current/bin/perl

   # configure
   use constant MARC = './und.marc';

   # require
   use strict;
   use MARC::Batch;

   # initialize
   binmode ( MARC, :utf8 );
   my $batch = MARC::Batch-new( 'USMARC', MARC );
   $batch-strict_off;
   $batch-warnings_off;
   binmode( STDOUT, :utf8 );

   # read  write
   while ( my $marc = $batch-next ) { print $marc-as_usmarc }

   # done
   exit;

 But my output is munged:

   $ ./marc.pl  und.mrc
   $ marcdump und.mrc | grep Sainte-Face
   und.mrc
   1000 records
   2000 records
   3000 records
   4000 records
   5000 records
   6000 records
   7000 records
   8000 records
   9000 records
   1 records
   11000 records
   12000 records
   245 00 _aAnnales de l'ArchiconfrÃ©rie de la Sainte-Face
   610_aArchiconfrÃ©rie de la Sainte-Face
   13000 records
   $

 What am I doing wrong!?

 --
 Eric Lease Morgan
 University of Notre Dame

 574/631-8604

Re: reading and writing of utf-8 with marc::batch

2013-03-26 Thread Leif Andersson

Hi Eric,

my first guess would be your terminal is not utf8.
If you comment out
#binmode( STDOUT, :utf8 );
and that does the trick, then you can start looking for how to change your 
terminal settings.
(And that can sometimes be a rather frustrating task, I'm afraid)

/Leif Andersson
Stockholm UL

Från: Eric Lease Morgan [emor...@nd.edu]
Skickat: den 26 mars 2013 21:22
Till: perl4lib@perl.org
Ämne: reading and writing of utf-8 with marc::batch

For the life of me I can't figure out how to do reading and writing of UTF-8 
with MARC::Batch.

I have a UTF-8 encoded file of MARC records. Dumping the records and greping 
for a particular string illustrates the validity:

  $ marcdump und.marc | grep Sainte-Face
  und.marc
  1000 records
  2000 records
  3000 records
  4000 records
  5000 records
  6000 records
  7000 records
  8000 records
  9000 records
  1 records
  11000 records
  12000 records
  245 00 _aAnnales de l'Archiconfrérie de la Sainte-Face
  610 20 _aArchiconfrérie de la Sainte-Face
  13000 records
  $

I then run a Perl script that simply reads each record and dumps it to STDOUT. 
Notice how I define both my input and output as UTF-8:

  #!/shared/perl/current/bin/perl

  # configure
  use constant MARC = './und.marc';

  # require
  use strict;
  use MARC::Batch;

  # initialize
  binmode ( MARC, :utf8 );
  my $batch = MARC::Batch-new( 'USMARC', MARC );
  $batch-strict_off;
  $batch-warnings_off;
  binmode( STDOUT, :utf8 );

  # read  write
  while ( my $marc = $batch-next ) { print $marc-as_usmarc }

  # done
  exit;

But my output is munged:

  $ ./marc.pl  und.mrc
  $ marcdump und.mrc | grep Sainte-Face
  und.mrc
  1000 records
  2000 records
  3000 records
  4000 records
  5000 records
  6000 records
  7000 records
  8000 records
  9000 records
  1 records
  11000 records
  12000 records
  245 00 _aAnnales de l'ArchiconfrÃ©rie de la Sainte-Face
  610_aArchiconfrÃ©rie de la Sainte-Face
  13000 records
  $

What am I doing wrong!?

--
Eric Lease Morgan
University of Notre Dame

574/631-8604

Re: reading and writing of utf-8 with marc::batch [double encoding]

Re: reading and writing of utf-8 with marc::batch [resolved; gigo]

FW: reading and writing of utf-8 with marc::batch

RE: reading and writing of utf-8 with marc::batch

Re: reading and writing of utf-8 with marc::batch

Re: reading and writing of utf-8 with marc::batch

Re: reading and writing of utf-8 with marc::batch [terminal]

Re: reading and writing of utf-8 with marc::batch [terminal]

Re: reading and writing of utf-8 with marc::batch

Re: reading and writing of utf-8 with marc::batch [double encoding]

Re: reading and writing of utf-8 with marc::batch [double encoding]

Re: reading and writing of utf-8 with marc::batch [double encoding]

Re: reading and writing of utf-8 with marc::batch [double encoding]

Re: reading and writing of utf-8 with marc::batch [double encoding]

Re: reading and writing of utf-8 with marc::batch

Re: reading and writing of utf-8 with marc::batch

Re: reading and writing of utf-8 with marc::batch

17 matches

Site Navigation

Mail list logo

Footer information