Hi Al, > For me I've found the best solution is to leave Encode.pm alone > and redefine the offending subroutine within my processing script.
This was timely help for me, too, due to problems with fatal errors when processing a large file of bibs with MARC::Record. Thanks! (Although, when I checked, I had Encode.pm version 2.12) -- Michael # Michael Doran, Systems Librarian # University of Texas at Arlington # 817-272-5326 office # 817-688-1926 mobile # do...@uta.edu # http://rocky.uta.edu/doran/ > -----Original Message----- > From: Al [mailto:ra...@berkeley.edu] > Sent: Tuesday, May 17, 2011 9:27 AM > To: Mike Barrett; perl4lib@perl.org > Subject: Re: Invalid UTF-8 characters causing MARC::Record crash. > > >Anybody ever see this before? > > All. The. Time. > > When I use Encode.pm version 2.12 I don't have this problem. But it occurs > repeatedly with version 2.40. > > There are a few different solutions, but I'm assuming, like me, that it's > not practical for you to clean up your MARC records *before* you try and > process them. So you can downgrade your Encode.pm or modify it to make it > less demanding. For me I've found the best solution is to leave Encode.pm > alone and redefine the offending subroutine within my processing script. I > paste this in at the bottom of every script: > > package Encode; > use Encode::Alias; > > sub decode($$;$) > { > my ($name,$octets,$check) = @_; > my $altstring = $octets; > return undef unless defined $octets; > $octets .= '' if ref $octets; > $check ||=0; > my $enc = find_encoding($name); > unless(defined $enc){ > require Carp; > Carp::croak("Unknown encoding '$name'"); > } > my $string; > eval { $string = $enc->decode($octets,$check); }; > $_[1] = $octets if $check and !($check & LEAVE_SRC()); > if ($@) { > return $altstring; > } else { > return $string; > } > } > > But I'll be interested in other solutions people may bring up. > > Good luck! > > Al > > > At 5/17/2011, Mike Barrett wrote: > >I'm using MARC::Batch and MARC::Field to iterate through a text file of > >bibliographic records from Voyager. > > > >The unrecoverable error is actually occurring in the Perl Unicode module > >which is, of course, called by MARC::Record. > >It's running into "invalid UTF-8 character 0xC2." > >When I looked up the Unicode character list, all of the C2 entries are > found > >hex characters, so it appears that the second half is missing. > > > >After looking at the records in Voyager (using Arial Unicode MS font), I > >find that all of the problem records I've found are maps with Field 255|a > >[scale] |b [projection] |c [geo cordinates]. > > > >Here's an example: > >As it appears in the text file: c(W 106¿¿¿30¿¿00¿¿--W > >104¿¿¿52¿¿30¿¿/N > >39¿¿¿22¿¿30¿¿--N 37¿¿¿15¿¿00¿¿). > >As it appears in Voyager Cataloging module: ‡a Scale 1:126,720 â‡c (W > >106â°30ʹ00ʺ--W 104â°52ʹ30ʺ/N 39â°22ʹ30ʺ--N 37â°15ʹ00ʺ). > > > > > >Thanks, > >Mike Barrett