Re: [CODE4LIB] utf8 "\xC2" does not map to Unicode

Reese, Terry Wed, 06 Apr 2011 14:07:22 -0700

I'd echo Jonathan's question -- the 0xC2 code is the sound recording marker in 
MARC-8.  I'd guess the file isn't in UTF8.


--TR

> -----Original Message-----
> From: Code for Libraries [mailto:[email protected]] On Behalf Of
> Jonathan Rochkind
> Sent: Wednesday, April 06, 2011 1:28 PM
> To: [email protected]
> Subject: Re: [CODE4LIB] utf8 "\xC2" does not map to Unicode
> 
> I am not familar with that Perl module. But I'm more familiar then I'd want
> with char encoding in Marc.
> 
> I don't recognize the bytes 0xC2 (there are some bytes I became pathetically
> familiar with in past debugging, but I've forgotten em), but the first things 
> to
> look at:
> 
> 1. Is your Marc file encoded in Marc8 or UTF-8?  I'm betting Marc8.
> Theoretically there is a Marc leader byte that tells you whether it's
> Marc8 or UTF-8, but the leader byte is often wrong in real world records.  Is 
> it
> wrong?
> 
> 2. Does Perl MARC::Batch  have a function to convert from Marc8 to
> UTF-8?   If so, how does it decide whether to convert? Is it trying to
> do that?  Is it assuming that the leader byte the record accurately
> identifies the encoding, and if so, is the leader byte wrong?   Is it
> trying to convert from Marc8 to UTF-8, when the source was UTF-8 in the
> first place?  Or is it assuming the source was UTF-8 in the first place, when 
> in
> fact it was Marc8?
> 
> Not the answer you wanted, maybe someone else will have that. Debugging
> char encoding is hands down the most annoying kind of debugging I ever do.
> 
> On 4/6/2011 4:13 PM, Eric Lease Morgan wrote:
> > Ack! While using the venerable Perl MARC::Batch module I get the
> following error while trying to read a MARC record:
> >
> >    utf8 "\xC2" does not map to Unicode
> >
> > This is a real pain, and I'm hoping someone here can help me either: 1) trap
> this error allowing me to move on, or 2) figure out how to open the file
> "correctly".
> >

Re: [CODE4LIB] utf8 "\xC2" does not map to Unicode

Reply via email to