Hey, that’s my post! Anyways, I haven’t really looked into what your problem 
is, but when you said that the copyright character is getting transformed to A9 
even though it is supposedly stored as C2 A9 in the database, it made me think 
of how there can be two UTF-8 representations for the same character in some 
sections of the Unicode set. I wonder if that is somehow happening for you.

Shelley


Shelley Doljack
Discovery Metadata Librarian
Metadata Dept., Lathrop Library
Stanford University Libraries
650-725-0167
sdolj...@stanford.edu



From: Highsmith, Anne L [mailto:hism...@library.tamu.edu]
Sent: Friday, November 13, 2015 2:05 PM
To: perl4lib@perl.org
Subject: Opening & writing to UTF-8 files; copyright symbol again -- solution

I should probably say, “apparent solution” ‘cause character set issues never 
seem to end.

However, combining Jon Gorman’s recommendation with some Googling, I get:

my $outfile='4788022.edited.bib';
open (my $output_marc, '>', $outfile) or die "Couldn't open file $!" ;
binmode($output_marc, ':utf8');

The open statement may not be quite correct, as I am not familiar with the more 
current techniques for opening file handles that John mentioned. However, when 
I use those instructions to open the output file rather than what I had before, 
the copyright symbol does indeed come across as C2 A9 as it was in the original 
record. I didn’t want to use the utf8, because I’ve tried that before and ended 
up with double-encoding (and a real mess). But I’ll continue testing.

The results of the googling I referred to can be found at: 
https://groups.google.com/forum/#!topic/perl.perl4lib/sy7hqiBQ1yM


Anne L. Highsmith
Director, Consortia Systems
TAMU Libraries
5000 TAMU
College Station, TX   77843-5000
979 862 4234
hism...@tamu.edu<mailto:hism...@tamu.edu>

Reply via email to