On Tue, Mar 26, 2013 at 04:22:03PM -0400, Eric Lease Morgan wrote: > For the life of me I can't figure out how to do reading and writing of > UTF-8 with MARC::Batch. > > I have a UTF-8 encoded file of MARC records. Dumping the records and > greping for a particular string illustrates the validity: > > $ marcdump und.marc | grep Sainte-Face
What is marcdump? > 245 00 _aAnnales de l'Archiconfrérie de la Sainte-Face > 610 20 _aArchiconfrérie de la Sainte-Face > 13000 records > $ > > I then run a Perl script that simply reads each record and dumps it to > STDOUT. Notice how I define both my input and output as UTF-8: Try *not* calling binmode and see what happens. Or just call binmode(MARC) without the ':utf8' layer. > 245 00 _aAnnales de l'Archiconfrérie de la Sainte-Face > 610 _aArchiconfrérie de la Sainte-Face > 13000 records > $ This looks like double-encoding: 00000000 6c 27 41 72 63 68 69 63 6f 6e 66 72 c3 83 c2 a9 |l'ArchiconfrÃ.©| 00000010 72 69 65 |rie| LATIN SMALL LETTER E WITH ACUTE is supposed to be c3 a9 (as it is in the first marcdump output) not c3 83 c2 a9. Paul. -- Paul Hoffman <nkui...@nkuitse.com>