I've been converting MARC XML records into USMARC and recently had a slew of bad records which MARCEdit reported as having invalid leaders. After a few days of puzzling over this and blaming it all on Unicode I noticed they were all records which contained newlines (0D 0A) in their datafields. As far as I know newlines aren't illegal in USMARC, but when I replaced them with spaces, sure enough the problem disappeared.
Test record: <?xml version="1.0" encoding="utf-8"?> <collection xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns="http://www.loc.gov/MARC21/slim"> <record> <leader>06965nam a2202005 u 4500</leader> <datafield tag="245" ind1="0" ind2="0"> <subfield code="a">Theoretical and Technological Aspects of Crystal Growth</subfield> </datafield> </record> </collection> (If your mail viewer mangles lines, there's a hard return (0D 0A) after the word Technological in the 245) Here is my test program which illustrates the problem: use MARC::Batch; use MARC::File::XML (BinaryEncoding => 'utf8', RecordFormat => 'UNIMARC'); use strict 'vars'; open (MARCOUT, ">test_out.marc") or die "Couldn't open test_out.marc for writing: $!\n"; binmode(MARCOUT, ':utf8'); my $batch = new MARC::Batch ('XML', 'test.xml'); my $record = $batch->next; print MARCOUT $record->as_usmarc; As I said, I don't think newlines are illegal in USMARC so I rather suspect the problem is somewhere in MARC::Record. I took the easier route though and replaced them with spaces in MARC::File::SAX and that solves the problem: sub characters { my ( $self, $chars ) = @_; if ( ( exists $self->{ subcode } && $self->{ subcode } ne '') || ( $self->{ tag } && ( $self->{ tag } eq 'LDR' || $self->{ tag } < 10 )) ) { $self->{ chars } .= $chars->{ Data }; ## Added by me, 1/11/2011 $self->{ chars } =~ s/\n/ /g; $self->{ chars } =~ s/ {2,}/ /g; } } So is this a bug that can be officially fixed or am I overlooking something? ActiveState perl 5.10, MARC::Record v.2.0.3, MARC::File::XML v. 0.93 Arvin