I also recommend using MARC::Batch. Attached is a simple script I wrote for
myself.

Saiful Amin
+91-9343826438


On Mon, Jan 25, 2010 at 8:33 PM, Robert Fox <rf...@nd.edu> wrote:

> Assuming that memory won't be an issue, you could use MARC::Batch to
> read in the record set and print out seperate files where you split on
> X amount of records. You would have an iterative loop loading each
> record from the large batch, and a counter variable that would get
> reset after X amount of records. You might want to name the sets using
> another counter that keeps track of how many sets you have and name
> each file something like batch_$count.mrc and write them out to a
> specific directory. Just concatenate each record to the previous one
> when you're making your smaller batches.
>
> Rob Fox
> Hesburgh Libraries
> University of Notre Dame
>
> On Jan 25, 2010, at 9:48 AM, "Nolte, Jennifer"
> <jennifer.no...@yale.edu> wrote:
>
> > Hello-
> >
> > I am working with files of MARC records that are over a million
> > records each. I'd like to split them down into smaller chunks,
> > preferably using a command line. MARCedit works, but is slow and
> > made for the desktop. I've looked around and haven't found anything
> > truly useful- Endeavor's MARCsplit comes close but doesn't separate
> > files into even numbers, only by matching criteria, so there could
> > be lots of record duplication between files.
> >
> > Any idea where to begin? I am a (super) novice Perl person.
> >
> > Thank you!
> >
> > ~Jenn Nolte
> >
> >
> > Jenn Nolte
> > Applications Manager / Database Analyst
> > Production Systems Team
> > Information Technology Office
> > Yale University Library
> > 130 Wall St.
> > New Haven CT 06520
> > 203 432 4878
> >
> >
>
#!c:/perl/bin/perl.exe
#
# Name: mbreaker.pl
# Version: 0.1
# Date: Jan 2009
# Author: Saiful Amin <sai...@edutech.com>
#
# Description: Extract MARC records based on command-line paramenters

use strict;
use warnings;
use Getopt::Long;
use MARC::Batch;

my $start       = 0;
my $end         = 1;

GetOptions ("start=i" => \$start,
                        "end=i"   => \$end
);

my $batch = MARC::Batch->new('USMARC', $ARGV[0]);
$batch->strict_off();
$batch->warnings_off();

my $num = 0;
while (my $record = $batch->next() ) {
        $num++;
        next if $num < $start;
        last if $num > $end;
        print $record->as_usmarc();
        warn "$num records\n" if ( $num % 1000 == 0 );
}


__END__

=head1 NAME

mbreaker.pl

Breaks the MARC record file as per start and end position specified

=head1 SYNOPSIS

mbreaker.pl [options] file

        Options:
     -start                     start position for reading records
         -end                   end position for reading records

Reply via email to