I haven't seen any responses, so I'll give this a shot.

Warning though, my memory goes a little bit fuzzy about unicode ;).

First, xBF looks suspiciously like part of a byte order mark (aka BOM aka
EF BB BF in utf-8).  I'd make sure that some change in the process hasn't
started introducing some weirdness or the unicode format . Having some
issues opening your screenshot (I'm not sure why you embedded it into
Word?  Mayhap just use a jpg or png? Or just copy/paste the hexdump?). I'd
hope your vendor wouldn't be so clueless to introduce a BOM in a utf-8
file, but it's a common byproduct of using various Windows libraries. I'd
be kinda surprised any such file could even be loaded in.

Second, I'd probably try using binmode and perhaps even flag the incoming
code as :raw. Very strange things can happen when parts of the chain try to
convert an utf-8 encoding into utf-8 (aka double-encoding). However, since
it's a database connection, you might need to look at it a bit.

Have you changed the way your perl script connects to the database? Updated
the perl libraries or the freetds/odbc libraries?

I'm also a little confused. So the test record you made replicated  the
issue w/ the unicode in a record that previously worked?  Or did it not?

Sorry, I fear I have more questions than advice...

Are you able to compare the raw record given by the vendor and what's
stored in Voyager.  (I wouldn't do any manipulation, I'd just have another
script do something like...

my $bib_marc = &get_bib_string($dbh, $dbase, $bib_id);
open $dump_file,'>', "dump_file.mrc' ;
binmode($dump_file,":raw")
print $dump_file $bib_marc ;




Jon G

On Thu, Sep 17, 2015 at 1:53 PM, Highsmith, Anne L <hism...@library.tamu.edu
> wrote:

> We just sent out our database to be RDA-ified and have reloaded it. We’re
> a Voyager site, so now I have to deal with the deadly problem of deleting
> the 035 matching 001 that is added by the bib load process. (Voyager users
> will know what I mean).
>
>
>
> Deleting that 035 is not the problem; I already had a program that did
> that. But when I went to test that program this morning against the new
> data, I found that it frequently exited with this error:
>
> “utf8 "\xBF" does not map to Unicode at C:/Perl/lib/Encode.pm line 200.”
>
>
>
> I discovered that the problematic field was a newly-added 264 with the
> copyright symbol in 264$c. I printed out several sample records that
> received this error and looked at the binary dumps. For the copyright
> symbol, they all had C2 A9, which appears to me to be the proper code point
> (am I using that correctly?) I added a copyright symbol to a test record
> online using the voyager client, dumped it and looked at it, and saw that
> the hex value for the copyright symbol added online was also C2 A9. So it
> looks to me as though the RDA vendor used the character when adding the
> copyright symbol. Or did they?
>
>
>
> So, what do I do to keep the program from blowing up? My program works by
> creating a marc record string from marc record pieces stored in the
> database, composing a marc record object from that string, changing the
> object, changing it back to a string, then updating the database. I am not
> working with an external file. The &get_bib_string is a locally written
> subroutine that gets the pieces of the marc record as they’re stored in the
> database and concatenates them into a string. I’ve been using it for years
> with no error, so that shouldn’t be the problem.
>
>
>
> The bulk of my code follows below the line. The error is generated by the
> call to the new_from_usmarc function
>
>                 my $bib_marc = &get_bib_string($dbh, $dbase, $bib_id);
>
>                 my $record = MARC::Record->new_from_usmarc($bib_marc);
>
>                 my @a035 = $record->field('035');
>
>                 foreach my $f035 (@a035) {
>
>                                 if (my $f035a = $f035->subfield('a')) {
>
>                                                 if ($f035a eq $bib_id) {
>
>
> $record->delete_field($f035);
>
>
> $change_flag = 1;
>
>                                                 }
>
>                                 }
>
> … [store updated marc record string to database]
>
>
> ____________________________________________________________________________________________________________________________________________________________________________________________________________
>
> The attachment shows the dump of part of the record; the copyright symbol
> appears in the 6th line from the top.
>

Reply via email to