RE: Invalid UTF-8 characters causing MARC::Record crash.

Doran, Michael D Wed, 18 May 2011 14:57:40 -0700

Hi Al,

> For me I've found the best solution is to leave Encode.pm alone
> and redefine the offending subroutine within my processing script.


This was timely help for me, too, due to problems with fatal errors when 
processing a large file of bibs with MARC::Record.  Thanks!

(Although, when I checked, I had Encode.pm version 2.12)

-- Michael

# Michael Doran, Systems Librarian
# University of Texas at Arlington
# 817-272-5326 office
# 817-688-1926 mobile
# do...@uta.edu
# http://rocky.uta.edu/doran/


> -----Original Message-----
> From: Al [mailto:ra...@berkeley.edu]
> Sent: Tuesday, May 17, 2011 9:27 AM
> To: Mike Barrett; perl4lib@perl.org
> Subject: Re: Invalid UTF-8 characters causing MARC::Record crash.
> 
>  >Anybody ever see this before?
> 
> All. The. Time.
> 
> When I use Encode.pm version 2.12 I don't have this problem. But it occurs
> repeatedly with version 2.40.
> 
> There are a few different solutions, but I'm assuming, like me, that it's
> not practical for you to clean up your MARC records *before* you try and
> process them. So you can downgrade your Encode.pm or modify it to make it
> less demanding. For me I've found the best solution is to leave Encode.pm
> alone and redefine the offending subroutine within my processing script. I
> paste this in at the bottom of every script:
> 
> package Encode;
> use Encode::Alias;
> 
> sub decode($$;$)
> {
>     my ($name,$octets,$check) = @_;
>     my $altstring = $octets;
>     return undef unless defined $octets;
>     $octets .= '' if ref $octets;
>     $check ||=0;
>     my $enc = find_encoding($name);
>     unless(defined $enc){
>        require Carp;
>        Carp::croak("Unknown encoding '$name'");
>     }
>     my $string;
>     eval { $string = $enc->decode($octets,$check); };
>     $_[1] = $octets if $check and !($check & LEAVE_SRC());
>     if ($@) {
>        return $altstring;
>     } else {
>        return $string;
>     }
> }
> 
> But I'll be interested in other solutions people may bring up.
> 
> Good luck!
> 
> Al
> 
> 
> At 5/17/2011, Mike Barrett wrote:
>  >I'm using MARC::Batch and MARC::Field to iterate through a text file of
>  >bibliographic records from Voyager.
>  >
>  >The unrecoverable error is actually occurring in the Perl Unicode module
>  >which is, of course, called by MARC::Record.
>  >It's running into "invalid UTF-8 character 0xC2."
>  >When I looked up the Unicode character list, all of the C2 entries are
> found
>  >hex characters, so it appears that the second half is missing.
>  >
>  >After looking at the records in Voyager (using Arial Unicode MS font), I
>  >find that all of the problem records I've found are maps with Field 255|a
>  >[scale] |b [projection] |c [geo cordinates].
>  >
>  >Here's an example:
>  >As it appears in the text file:  c(W 106Â¿Â¿Â¿30Â¿Â¿00Â¿Â¿--W
>  >104Â¿Â¿Â¿52Â¿Â¿30Â¿Â¿/N
>  >39Â¿Â¿Â¿22Â¿Â¿30Â¿Â¿--N 37Â¿Â¿Â¿15Â¿Â¿00Â¿Â¿).
>  >As it appears in Voyager Cataloging module:  ‡a Scale 1:126,720 ââ€¡c (W
>  >106â°30Ê¹00Êº--W 104â°52Ê¹30Êº/N 39â°22Ê¹30Êº--N 37â°15Ê¹00Êº).
>  >
>  >
>  >Thanks,
>  >Mike Barrett

RE: Invalid UTF-8 characters causing MARC::Record crash.

Reply via email to