I'd echo Jonathan's question -- the 0xC2 code is the sound recording marker in MARC-8. I'd guess the file isn't in UTF8.
--TR > -----Original Message----- > From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of > Jonathan Rochkind > Sent: Wednesday, April 06, 2011 1:28 PM > To: CODE4LIB@LISTSERV.ND.EDU > Subject: Re: [CODE4LIB] utf8 "\xC2" does not map to Unicode > > I am not familar with that Perl module. But I'm more familiar then I'd want > with char encoding in Marc. > > I don't recognize the bytes 0xC2 (there are some bytes I became pathetically > familiar with in past debugging, but I've forgotten em), but the first things > to > look at: > > 1. Is your Marc file encoded in Marc8 or UTF-8? I'm betting Marc8. > Theoretically there is a Marc leader byte that tells you whether it's > Marc8 or UTF-8, but the leader byte is often wrong in real world records. Is > it > wrong? > > 2. Does Perl MARC::Batch have a function to convert from Marc8 to > UTF-8? If so, how does it decide whether to convert? Is it trying to > do that? Is it assuming that the leader byte the record accurately > identifies the encoding, and if so, is the leader byte wrong? Is it > trying to convert from Marc8 to UTF-8, when the source was UTF-8 in the > first place? Or is it assuming the source was UTF-8 in the first place, when > in > fact it was Marc8? > > Not the answer you wanted, maybe someone else will have that. Debugging > char encoding is hands down the most annoying kind of debugging I ever do. > > On 4/6/2011 4:13 PM, Eric Lease Morgan wrote: > > Ack! While using the venerable Perl MARC::Batch module I get the > following error while trying to read a MARC record: > > > > utf8 "\xC2" does not map to Unicode > > > > This is a real pain, and I'm hoping someone here can help me either: 1) trap > this error allowing me to move on, or 2) figure out how to open the file > "correctly". > >