Hi,

A long time ago, I've written the following :

--- snippet ---
#!/usr/bin/env perl

use strict;
use warnings;

use MARC::File::USMARC;
use MARC::Record;

use Getopt::Long;

my $config = { output => 'input' };

GetOptions($config, 'input=s', 'chunk=s', 'output=s', 'max=s');

if (not exists $config->{input} and not exists $config->{chunk}) {
    die "Usage: $0 --input file --chunk size [--output file]\n";
} else {
    run($config->{input}, $config->{output}, $config->{chunk},
$config->{max});

}

sub run {
    my ($input, $output, $chunk, $max) = @_;

    my $marcfile = MARC::File::USMARC->in($input);

    my $fh = $output eq 'input' ? create_file($input) :
create_file($output);
    my $cpt = 1;
        my $total = 0;
    while (my $record = $marcfile->next) {
        $total++;

        if (defined $max) {
            last if $total > $max;
        }
        if ($cpt++ > $chunk) {
            close $fh;
            $fh = $output eq 'input' ? create_file($input) :
create_file($output);
            $cpt = 1;
        }

        print $fh $record->as_usmarc;
    }
    close $fh;
}

sub create_file {
    my ($output) = @_;
    my $cpt = 0;

    my $filename = sprintf('%s.%03d', $output, $cpt++);
    while (-e $filename) {
        $filename = sprintf('%s.%03d', $output, $cpt++);
    }

    open my $fh, '>', $filename;
    return $fh;
}
--- snippet ---

Hope this help

Emmanuel Di Pretoro

2010/1/25 Nolte, Jennifer <jennifer.no...@yale.edu>

> Hello-
>
> I am working with files of MARC records that are over a million records
> each. I'd like to split them down into smaller chunks, preferably using a
> command line. MARCedit works, but is slow and made for the desktop. I've
> looked around and haven't found anything truly useful- Endeavor's MARCsplit
> comes close but doesn't separate files into even numbers, only by matching
> criteria, so there could be lots of record duplication between files.
>
> Any idea where to begin? I am a (super) novice Perl person.
>
> Thank you!
>
> ~Jenn Nolte
>
>
> Jenn Nolte
> Applications Manager / Database Analyst
> Production Systems Team
> Information Technology Office
> Yale University Library
> 130 Wall St.
> New Haven CT 06520
> 203 432 4878
>
>
>

Reply via email to