Re: [CODE4LIB] MARC reporting engine

Jonathan Rochkind Mon, 03 Nov 2014 15:45:40 -0800

Hm. You don't need to keep all 800k records in memory, you just need tokeep the data you need in memory, right? I'd keep a hash keyed byauthorized heading, with the values I need there.

I don't think you'll have trouble keeping such a hash in memory, for abatch process run manually once in a while -- modern OS's do a great jobwith virtual memory making it invisible (but slower) when you use morememory than you have physically, if it comes to that, which it may not.

If you do, you could keep the data you need in the data store of yourchoice, such as a local DBM database, which ruby/python/perl will alllet you do pretty painlessly, accessing a hash-like data structure whichis actually stored on disk not in memory but which you access more orless the same as an in-memory hash.


But, yes, it will require some programming, for sure.

A "MARC Indexer" can mean many things, and I'm not sure you need onehere, but as it happens I have built something you could describe as a"MARC Indexer", and I guess it wasn't exactly straightforward, it'strue. I'm not sure it's of any use to you here for your use case, butyou can check it out at https://github.com/traject-project/traject


On 11/2/14 9:29 PM, Stuart Yeates wrote:

Do any of these have built-in indexing? 800k records isn't going to
fit in memory and if building my own MARC indexer is 'relatively
straightforward' then you're a better coder than I am.

cheers stuart

-- I have a new phone number: 04 463 5692

________________________________________ From: Code for Libraries
<CODE4LIB@LISTSERV.ND.EDU> on behalf of Jonathan Rochkind
<rochk...@jhu.edu> Sent: Monday, 3 November 2014 1:24 p.m. To:
CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] MARC reporting
engine

If you are, can become, or know, a programmer, that would be
relatively straightforward in any programming language using the open
source MARC processing library for that language. (ruby marc, pymarc,
perl marc, whatever).

Although you might find more trouble than you expect around
authorities, with them being less standardized in your corpus than
you might like. ________________________________________ From: Code
for Libraries [CODE4LIB@LISTSERV.ND.EDU] on behalf of Stuart Yeates
[stuart.yea...@vuw.ac.nz] Sent: Sunday, November 02, 2014 5:48 PM To:
CODE4LIB@LISTSERV.ND.EDU Subject: [CODE4LIB] MARC reporting engine

I have ~800,000 MARC records from an indexing service
(http://natlib.govt.nz/about-us/open-data/innz-metadata CC-BY). I am
trying to generate:

(a) a list of person authorities (and sundry metadata), sorted by how
many times they're referenced, in wikimedia syntax

(b) a view of a person authority, with all the records by which
they're referenced, processed into a wikipedia stub biography

I have established that this is too much data to process in XSLT or
multi-line regexps in vi. What other MARC engines are there out
there?

The two options I'm aware of are learning multi-line processing in
sed or learning enough koha to write reports in whatever their
reporting engine is.

Any advice?

cheers stuart -- I have a new phone number: 04 463 5692

Re: [CODE4LIB] MARC reporting engine

Reply via email to