Do any of these have built-in indexing? 800k records isn't going to fit in 
memory and if building my own MARC indexer is 'relatively straightforward' then 
you're a better coder than I am. 

cheers
stuart

--
I have a new phone number: 04 463 5692

________________________________________
From: Code for Libraries <[email protected]> on behalf of Jonathan 
Rochkind <[email protected]>
Sent: Monday, 3 November 2014 1:24 p.m.
To: [email protected]
Subject: Re: [CODE4LIB] MARC reporting engine

If you are, can become, or know, a programmer, that would be relatively 
straightforward in any programming language using the open source MARC 
processing library for that language. (ruby marc, pymarc, perl marc, whatever).

Although you might find more trouble than you expect around authorities, with 
them being less standardized in your corpus than you might like.
________________________________________
From: Code for Libraries [[email protected]] on behalf of Stuart Yeates 
[[email protected]]
Sent: Sunday, November 02, 2014 5:48 PM
To: [email protected]
Subject: [CODE4LIB] MARC reporting engine

I have ~800,000 MARC records from an indexing service 
(http://natlib.govt.nz/about-us/open-data/innz-metadata CC-BY). I am trying to 
generate:

(a) a list of person authorities (and sundry metadata), sorted by how many 
times they're referenced, in wikimedia syntax

(b) a view of a person authority, with all the records by which they're 
referenced, processed into a wikipedia stub biography

I have established that this is too much data to process in XSLT or multi-line 
regexps in vi. What other MARC engines are there out there?

The two options I'm aware of are learning multi-line processing in sed or 
learning enough koha to write reports in whatever their reporting engine is.

Any advice?

cheers
stuart
--
I have a new phone number: 04 463 5692

Reply via email to