First off, it's entirely possible that you have bad UTF-8 (perhaps rogue
MARC-8, perhaps just lousy characters) in your MARC. I know we have plenty
of that crap.

You need to tell perl that you'll be outputting UTF-8 using 'bincode'

  binmode(FILE, ':utf8');

In general, you'll want to do this to basically every file you open for
reading or writing.

A great overview of Perl and UTF-8 can be found at:

http://stackoverflow.com/questions/6162484/why-does-modern-perl-avoid-utf-8-by-default





On Mon, Jul 30, 2012 at 6:51 PM, Shelley Doljack <sdolj...@stanford.edu>wrote:

> Hi,
>
> I wrote a script that extracts marc records from a file given certain
> conditions and puts them in a new file. When my input record is correctly
> encoded in UTF-8 and I run my script from windows command prompt, this
> warning message appears: "Wide character in print at record_extraction.plline 
> 99" (the line in my script where I print to a new file using
> as_usmarc). I compared the extracted record before and after in MarcEdit
> and the diacritic was changed. I tried marcdump newfile.mrc to see what
> happens and I get this error: "utf8 \xF4 does not map to Unicode at
> C:/Perl64/lib/Encode.pm line 176." When I run my extraction script again
> with MARC-8 encoded data then I don't have the same problem.
>
> The basic outline of my script is:
>
> my $batch = MARC::Batch->new('USMARC', $input_file);
>
> while (my $record = $batch->next()) {
>      #do some checks
>      #if checks ok then
>      print FILE $record->as_usmarc();
> }
>
> Do I need to add something that specifies to interpret the data as UTF-8?
> Does MARC::Record not handle UTF-8 at all?
>
> Thanks,
> Shelley
>
> ----
> Shelley Doljack
> E-Resources Metadata Librarian
> Metadata and Library Systems
> Stanford University Libraries
> sdolj...@stanford.edu
> 650-725-0167
>



-- 

Bill Dueber
Programmer -- Library Systems
University of Michigan

Reply via email to