Re: [CODE4LIB] CODE4LIB Digest - 7 Sep 2013 to 8 Sep 2013 (#2013-231)

[email protected] Sun, 08 Sep 2013 21:36:49 -0700

Reads in MarcXML and stores records in DB. Deduplication, edit records.export records by hand or cronjob.


http://sourceforge.net/projects/bibnet/

It's a specialized application for journal article, but may work for anykind of MarcXML with some restrictions.


Can handle large sets of data.

Markus Fischer

Am 09.09.2013 05:00, schrieb CODE4LIB automatic digest system:

There are 2 messages totaling 79 lines in this issue.

Topics of the day:

   1. XML split and transform in Java (2)

----------------------------------------------------------------------

Date:    Sun, 8 Sep 2013 16:22:22 +0000
From:    Tod Olson <[email protected]>
Subject: XML split and transform in Java

code4lib,

I'm looking for some advice on splitting and transforming XML data using Java. 
The context is writing a mixin for SolrMARC to enhance our bib data, bringing 
in table of contents and summary data. The data is in XML, isomorphic to 
MARCXML. I need to split it up, transform it, and store it for use at import 
time. I expect the input XML to be up to a few GB, so slurping the whole thing 
into a DOM seems questionable. I've done one implementation for a split-only 
version of the problem, but the transform requirement is causing me to re-think.

And maybe someone out there has already done this exact thing.

I think the basic approach is to read a record from start tag to end tag, and 
create a reader/stream/whatever to hand exactly that record to the transform 
API. Lots of options for this: SAX, StAX events, or what have you. Any thoughts 
of what seems the most straightforward for this split-and-transform scenario 
would be welcome.

On a related note, any thoughts on your favorite light-weight key/value pair 
persistent storage for Java would be welcome. I expect the data to be a little 
large for a serialized HashMap.

Best,

-Tod

Tod Olson <[email protected]>
Systems Librarian
University of Chicago Library

------------------------------

Date:    Sun, 8 Sep 2013 20:22:24 +0200
From:    Chris Fitzpatrick <[email protected]>
Subject: Re: XML split and transform in Java

Hi,

Would something like this work?

https://github.com/marc4j/marc4j/blob/master/src/org/marc4j/samples/StylesheetChainExample.java

On Sun, Sep 8, 2013 at 6:22 PM, Tod Olson <[email protected]> wrote:

code4lib,

I'm looking for some advice on splitting and transforming XML data using
Java. The context is writing a mixin for SolrMARC to enhance our bib data,
bringing in table of contents and summary data. The data is in XML,
isomorphic to MARCXML. I need to split it up, transform it, and store it
for use at import time. I expect the input XML to be up to a few GB, so
slurping the whole thing into a DOM seems questionable. I've done one
implementation for a split-only version of the problem, but the transform
requirement is causing me to re-think.

And maybe someone out there has already done this exact thing.

I think the basic approach is to read a record from start tag to end tag,
and create a reader/stream/whatever to hand exactly that record to the
transform API. Lots of options for this: SAX, StAX events, or what have
you. Any thoughts of what seems the most straightforward for this
split-and-transform scenario would be welcome.

On a related note, any thoughts on your favorite light-weight key/value
pair persistent storage for Java would be welcome. I expect the data to be
a little large for a serialized HashMap.

Best,

-Tod


Tod Olson <[email protected]>
Systems Librarian
University of Chicago Library


------------------------------

End of CODE4LIB Digest - 7 Sep 2013 to 8 Sep 2013 (#2013-231)
*************************************************************

Re: [CODE4LIB] CODE4LIB Digest - 7 Sep 2013 to 8 Sep 2013 (#2013-231)

Reply via email to