Saiful Amin
Mon, 25 Jan 2010 10:18:59 -0800
I also recommend using MARC::Batch. Attached is a simple script I wrote for myself. Saiful Amin +91-9343826438 On Mon, Jan 25, 2010 at 8:33 PM, Robert Fox <rf...@nd.edu> wrote: > Assuming that memory won't be an issue, you could use MARC::Batch to > read in the record set and print out seperate files where you split on > X amount of records. You would have an iterative loop loading each > record from the large batch, and a counter variable that would get > reset after X amount of records. You might want to name the sets using > another counter that keeps track of how many sets you have and name > each file something like batch_$count.mrc and write them out to a > specific directory. Just concatenate each record to the previous one > when you're making your smaller batches. > > Rob Fox > Hesburgh Libraries > University of Notre Dame > > On Jan 25, 2010, at 9:48 AM, "Nolte, Jennifer" > <jennifer.no...@yale.edu> wrote: > > > Hello- > > > > I am working with files of MARC records that are over a million > > records each. I'd like to split them down into smaller chunks, > > preferably using a command line. MARCedit works, but is slow and > > made for the desktop. I've looked around and haven't found anything > > truly useful- Endeavor's MARCsplit comes close but doesn't separate > > files into even numbers, only by matching criteria, so there could > > be lots of record duplication between files. > > > > Any idea where to begin? I am a (super) novice Perl person. > > > > Thank you! > > > > ~Jenn Nolte > > > > > > Jenn Nolte > > Applications Manager / Database Analyst > > Production Systems Team > > Information Technology Office > > Yale University Library > > 130 Wall St. > > New Haven CT 06520 > > 203 432 4878 > > > > >
#!c:/perl/bin/perl.exe
#
# Name: mbreaker.pl
# Version: 0.1
# Date: Jan 2009
# Author: Saiful Amin <sai...@edutech.com>
#
# Description: Extract MARC records based on command-line paramenters
use strict;
use warnings;
use Getopt::Long;
use MARC::Batch;
my $start = 0;
my $end = 1;
GetOptions ("start=i" => \$start,
"end=i" => \$end
);
my $batch = MARC::Batch->new('USMARC', $ARGV[0]);
$batch->strict_off();
$batch->warnings_off();
my $num = 0;
while (my $record = $batch->next() ) {
$num++;
next if $num < $start;
last if $num > $end;
print $record->as_usmarc();
warn "$num records\n" if ( $num % 1000 == 0 );
}
__END__
=head1 NAME
mbreaker.pl
Breaks the MARC record file as per start and end position specified
=head1 SYNOPSIS
mbreaker.pl [options] file
Options:
-start start position for reading records
-end end position for reading records