Ashley Sanders
Mon, 25 Jan 2010 07:41:22 -0800
Jennifer,
I am working with files of MARC records that are over a million records each. I'd like to split them down into smaller chunks, preferably using a command line. MARCedit works, but is slow and made for the desktop. I've looked around and haven't found anything truly useful- Endeavor's MARCsplit comes close but doesn't separate files into even numbers, only by matching criteria, so there could be lots of record duplication between files. Any idea where to begin? I am a (super) novice Perl person.
Well... if you have a *nix style command line and the usual utilities and your file of MARC records is in exchange format with the records just delimited by the end-of-record character 0x1d, then you could do something like this: tr '\035' '\n' < my-marc-file.mrc > recs.txt split -1000 recs.txt The tr command will turn the MARC end-of-record characters into newlines. Then use the split command to carve up the output of tr into files of 1000 records. You then may have to use tr to convert the newlines back to MARC end-of-record characters. Ashley. -- Ashley Sanders a.sand...@manchester.ac.uk Copac http://copac.ac.uk A Mimas service funded by JISC