Hi, A long time ago, I've written the following :
--- snippet --- #!/usr/bin/env perl use strict; use warnings; use MARC::File::USMARC; use MARC::Record; use Getopt::Long; my $config = { output => 'input' }; GetOptions($config, 'input=s', 'chunk=s', 'output=s', 'max=s'); if (not exists $config->{input} and not exists $config->{chunk}) { die "Usage: $0 --input file --chunk size [--output file]\n"; } else { run($config->{input}, $config->{output}, $config->{chunk}, $config->{max}); } sub run { my ($input, $output, $chunk, $max) = @_; my $marcfile = MARC::File::USMARC->in($input); my $fh = $output eq 'input' ? create_file($input) : create_file($output); my $cpt = 1; my $total = 0; while (my $record = $marcfile->next) { $total++; if (defined $max) { last if $total > $max; } if ($cpt++ > $chunk) { close $fh; $fh = $output eq 'input' ? create_file($input) : create_file($output); $cpt = 1; } print $fh $record->as_usmarc; } close $fh; } sub create_file { my ($output) = @_; my $cpt = 0; my $filename = sprintf('%s.%03d', $output, $cpt++); while (-e $filename) { $filename = sprintf('%s.%03d', $output, $cpt++); } open my $fh, '>', $filename; return $fh; } --- snippet --- Hope this help Emmanuel Di Pretoro 2010/1/25 Nolte, Jennifer <jennifer.no...@yale.edu> > Hello- > > I am working with files of MARC records that are over a million records > each. I'd like to split them down into smaller chunks, preferably using a > command line. MARCedit works, but is slow and made for the desktop. I've > looked around and haven't found anything truly useful- Endeavor's MARCsplit > comes close but doesn't separate files into even numbers, only by matching > criteria, so there could be lots of record duplication between files. > > Any idea where to begin? I am a (super) novice Perl person. > > Thank you! > > ~Jenn Nolte > > > Jenn Nolte > Applications Manager / Database Analyst > Production Systems Team > Information Technology Office > Yale University Library > 130 Wall St. > New Haven CT 06520 > 203 432 4878 > > >