> I have to admit my Perl skill is very limited, so this may be a dumb question, > but I can't seem to find answer. When I use MARC::Batch to read records from > our catalog (III) export file, I can't seem to find a way to skip an error > record. When I ran the following against an III export MARC file, it stopped > at a record with error. > > utf8 "\xBC" does not map to Unicode at /usr/lib/perl/5.10/Encode.pm line 174.
I'm surprised that the error line is being reported from the Encode module. Usually modules are written so that an error report tells you whereabouts the error occurred in the code that was using a feature provided by the module. This makes it harder to tell exactly which line of your own script is triggering the error. I guess that the author of the Encode module really did not expect this to happen. > Ideally I would like to be able to log the error and move to the next record. In general you can trap errors by using the "eval" construct: eval { # ... code that might trigger error in here ... } warn $@ if $@; # print any error from eval block as a warning See http://perldoc.perl.org/functions/eval.html Put something like that around the part of your code that triggers the error and you should get a bit further. One thing to ask, of course, is why there is an error in the first place! It looks like the MARC record is not being converted for the right character set. I see you have set strict to be off for the batch. We have a Millennium system here, and the internal coding is MARC8 rather than UTF8. I've found that Innovative has a sort of hack to allow arbitrary Unicode characters to be carried in the MARC record. We notice this particularly with records containing directional quotation marks. One of the effects is that byte values such as 0x1D can occur mid-record. The MARC::File::USMARC module assumes that 0x1D is the end of record marker. To get the module to split the records accurately I had to modify the module as follows: Change the lines in USMARC.pm that say local $/ = END_OF_RECORD; my $usmarc = <$fh>; to instead say ###################################################################### # Altered by Matthew Phillips to cope with 0x1D within field values ###################################################################### # local $/ = END_OF_RECORD; # my $usmarc = <$fh>; my $length; read($fh, $length, 5) || return; return unless $length>=5; my $record; read($fh, $record, $length-5) || return; my $usmarc = $length.$record; ###################################################################### # End of alteration ###################################################################### You should then get all records being split at the right places. The alteration relies on the byte count at the start of the record being accurate, and works nicely for Innovative record output, but if you're going to be reading records from other sources it may not help as the byte count can be unreliable. I submitted the patch to the module maintainer a few weeks ago and he was considering how to incorporate it as an option, as it's not appropriate in all circumstances. There may well still be character conversion issues, however, because the MARC::Charset module does not know about Innovative's encoding, and is slightly broken in other respects. I have not written a patch for this aspect yet. Hope that helps a bit! -- Matthew Phillips Electronic Systems Librarian, Durham University Durham University Library, Stockton Road, Durham, DH1 3LY +44 (0)191 334 2941