Hi,
On Wed, Mar 27, 2013 at 2:11 PM, Eric Lease Morgan <emor...@nd.edu> wrote: > Put another way, how can I determine whether or not position #9 of a given > MARC leader is accurate? If position #9 is an "a", then how can I read the > balance of the record to determine whether or not all the characters really > and truly are UTF-8 encoded? > The following program will read a file of MARC records from standard input and classify each as either being valid UTF-8 or not. ___START____ #!/usr/bin/perl use Encode; binmode STDIN, ':bytes'; $/ = "\035"; # MARC record terminator my $i = 0; while (<>) { $i++; my $bytes = $_; eval { my $utf8str = Encode::decode('UTF-8', $bytes, Encode::FB_CROAK); }; if ($@) { print "Record $i is valid UTF-8\n"; } else { print "Record $i definitely not valid UTF-8\n"; } } ___END____ Regards, Galen -- Galen Charlton gmcha...@gmail.com