date:20120801

RE: printing UTF-8 encoded MARC records with as_usmarc

2012-08-01 Thread PHILLIPS M.E.

 -Original Message-
 From: Shelley Doljack [mailto:sdolj...@stanford.edu]
 Sent: 31 July 2012 20:18

 The problem was I wasn't telling perl to output UTF-8. Now that I added
 binmode(FILE, ':utf8') to my script, the problem is fixed. However, it sounds
 like once I set binmode to UTF-8 everything will be interpreted as such, even
 when the record is in MARC-8. Is that right? So this means that I can only use
 my script with a file of records where all of them are encoded in UTF-8. If I
 want to run the script against a file with all MARC-8 encoding, then I'd need
 to remove the binmode line.

It depends how much manipulation of the records you are doing in the script.  
One approach is to use

binmode(FILE, ':raw');

for both input and output.  Perl will then keep the bytes of the records 
exactly as they are.  You won't be able to test  for exotic characters so 
easily, and amending field content would be inadvisable, but if all you are 
doing is something like reading in the records and filtering out any that have 
no 245 field, or something fairly basic like that, this could be the best 
approach.

The MARC::Record module does not seem to care how the records are encoded.  
It's only once you start altering field content, testing field content, or 
adding fields that the character set being used becomes an issue.  Removing 
fields would be fine too.

MARC-8 can be very complex, particularly if other code tables like CJK are 
invoked, or even just Greek or Cyrillic.  If you were manipulating field 
content in that kind of way they converting everything to UTF-8 would make 
things very much easier.

Matthew

-- 
Matthew Phillips
Electronic Systems Librarian, Durham University
Durham University Library, Stockton Road, Durham, DH1 3LY
+44 (0)191 334 2941

Re: printing UTF-8 encoded MARC records with as_usmarc

2012-08-01 Thread Colin Campbell

On Tue, Jul 31, 2012 at 09:25:55AM -0400, Smith,Devon wrote:
 I just recently came across this presentation which lays out pretty much all 
 the issues with Unicode in perl, and makes some recommendations for best 
 practices. You may find some general insight into the whole situation by 
 going over it.
In the course of preparing the latest edition of the Camel book Tom
Christiansen created a Perl Unicode Cookbook see
http://www.perl.com/pub/2012/04/perlunicook-standard-preamble.html

Its available in a few different places on the web

C.

-- 
Colin Campbell
Chief Software Engineer,
PTFS Europe Limited
Content Management and Library Solutions
+44 (0) 800 756 6803 (phone)
+44 (0) 7759 633626  (mobile)
colin.campb...@ptfs-europe.com
skype: colin_campbell2

http://www.ptfs-europe.com

RE: printing UTF-8 encoded MARC records with as_usmarc

Re: printing UTF-8 encoded MARC records with as_usmarc

2 matches

Site Navigation

Mail list logo

Footer information