Emmanuel Di Pretoro
Mon, 25 Jan 2010 06:57:13 -0800
Hi,
A long time ago, I've written the following :
--- snippet ---
#!/usr/bin/env perl
use strict;
use warnings;
use MARC::File::USMARC;
use MARC::Record;
use Getopt::Long;
my $config = { output => 'input' };
GetOptions($config, 'input=s', 'chunk=s', 'output=s', 'max=s');
if (not exists $config->{input} and not exists $config->{chunk}) {
die "Usage: $0 --input file --chunk size [--output file]\n";
} else {
run($config->{input}, $config->{output}, $config->{chunk},
$config->{max});
}
sub run {
my ($input, $output, $chunk, $max) = @_;
my $marcfile = MARC::File::USMARC->in($input);
my $fh = $output eq 'input' ? create_file($input) :
create_file($output);
my $cpt = 1;
my $total = 0;
while (my $record = $marcfile->next) {
$total++;
if (defined $max) {
last if $total > $max;
}
if ($cpt++ > $chunk) {
close $fh;
$fh = $output eq 'input' ? create_file($input) :
create_file($output);
$cpt = 1;
}
print $fh $record->as_usmarc;
}
close $fh;
}
sub create_file {
my ($output) = @_;
my $cpt = 0;
my $filename = sprintf('%s.%03d', $output, $cpt++);
while (-e $filename) {
$filename = sprintf('%s.%03d', $output, $cpt++);
}
open my $fh, '>', $filename;
return $fh;
}
--- snippet ---
Hope this help
Emmanuel Di Pretoro
2010/1/25 Nolte, Jennifer <jennifer.no...@yale.edu>
> Hello-
>
> I am working with files of MARC records that are over a million records
> each. I'd like to split them down into smaller chunks, preferably using a
> command line. MARCedit works, but is slow and made for the desktop. I've
> looked around and haven't found anything truly useful- Endeavor's MARCsplit
> comes close but doesn't separate files into even numbers, only by matching
> criteria, so there could be lots of record duplication between files.
>
> Any idea where to begin? I am a (super) novice Perl person.
>
> Thank you!
>
> ~Jenn Nolte
>
>
> Jenn Nolte
> Applications Manager / Database Analyst
> Production Systems Team
> Information Technology Office
> Yale University Library
> 130 Wall St.
> New Haven CT 06520
> 203 432 4878
>
>
>