The MARC XML seemed to be an archive within an archive - I had to gunzip to get 
innzmetadata.xml then rename to innzmetadata.xml.gz and gunzip again to get the 
actual xml

Owen

Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: o...@ostephens.com
Telephone: 0121 288 6936

> On 3 Nov 2014, at 22:38, Robert Haschart <rh...@virginia.edu> wrote:
> 
> I was going to echo Eric Hatcher's recommendation of Solr and SolrMarc, since 
> I'm the creator of SolrMarc.
> It does provide many of the same tools as are described in the toolset you 
> linked to,  but it is designed to write to Solr rather than to a SQL style 
> database.   Solr may or may not be more suitable for your needs then a SQL 
> database.   However I decided to download the data to see whether SolrMarc 
> could handle it.   I started with the MARCXML.gz data, ungzipped it to get a 
> .XML file, but the resulting file causes SolrMarc to blow chunks.   Either 
> I'm missing something or there is something way wrong with that data.    The 
> gzipped binary MARC file work fine with the SolrMarc tools.
> 
> Creating a SolrMarc script to extract the 700 fields, plus a bash script to 
> cluster and count them, and sort by frequency took about 20 minutes.
> 
> -Bob Haschart
> 
> 
> On 11/3/2014 3:00 PM, Stuart Yeates wrote:
>> Thank you to all who responded with software suggestions. 
>> https://github.com/ubleipzig/marctools is looking like the most promising 
>> candidate so far. The more I read through the recommendations the more it 
>> dawned on me that I don't want to have to configure yet another java 
>> toolchain (yes I know, that may be personal bias).
>> 
>> Thank you to all who responded about the challenges of authority control in 
>> such collections. I'm aware of these issues. The current project is about 
>> marshalling resources for editors to make informed decisions about rather 
>> than automating the creation of articles, because there is human judgement 
>> involved in the last step I can afford to take a few authority control 
>> 'risks'
>> 
>> cheers
>> stuart
>> 
>> --
>> I have a new phone number: 04 463 5692
>> 
>> ________________________________________
>> From: Code for Libraries<CODE4LIB@LISTSERV.ND.EDU>  on behalf of raffaele 
>> messuti<raffaele.mess...@gmail.com>
>> Sent: Monday, 3 November 2014 11:39 p.m.
>> To: CODE4LIB@LISTSERV.ND.EDU
>> Subject: Re: [CODE4LIB] MARC reporting engine
>> 
>> Stuart Yeates wrote:
>>> Do any of these have built-in indexing? 800k records isn't going to fit in 
>>> memory and if building my own MARC indexer is 'relatively straightforward' 
>>> then you're a better coder than I am.
>> you could try marcdb[1] from marctools[2]
>> 
>> [1] https://github.com/ubleipzig/marctools#marcdb
>> [2] https://github.com/ubleipzig/marctools
>> 
>> 
>> --
>> raffaele

Reply via email to