RE: Splitting a large file of MARC records into smaller files

Smith,Devon Mon, 25 Jan 2010 07:13:39 -0800

This isn't a perl solution, but it may work for you.

You can use the unix split command to split a file into several other
files with the same number of lines each. For that to work, you'll first
have to use tr to convert the ^] record separators into newlines. Then
use tr to convert them all back in each split file.


#> tr '^]' '\n' < filename > filename.nl
#> split -l $lines_per_file filename.nl SPLIT
#> for file in SPLIT*; do tr '\n' '^]' < $file > ${file%.nl}.rs

Or something like that.

/dev
-- 

Devon Smith
Consulting Software Engineer
OCLC Office of Research



-----Original Message-----
From: Nolte, Jennifer [mailto:jennifer.no...@yale.edu] 
Sent: Monday, January 25, 2010 9:48 AM
To: perl4lib@perl.org
Subject: Splitting a large file of MARC records into smaller files

Hello-

I am working with files of MARC records that are over a million records
each. I'd like to split them down into smaller chunks, preferably using
a command line. MARCedit works, but is slow and made for the desktop.
I've looked around and haven't found anything truly useful- Endeavor's
MARCsplit comes close but doesn't separate files into even numbers, only
by matching criteria, so there could be lots of record duplication
between files.

Any idea where to begin? I am a (super) novice Perl person.

Thank you!

~Jenn Nolte


Jenn Nolte
Applications Manager / Database Analyst
Production Systems Team
Information Technology Office
Yale University Library
130 Wall St.
New Haven CT 06520
203 432 4878

RE: Splitting a large file of MARC records into smaller files

Reply via email to