The Texas Digital Library has been investigating using ORE and OAI-PMH in conjunction with handling ETDs from various schools across Texas in a federated collection. I would like to briefly cover the background and design that we are looking at to see if anyone has any feedback or comments on this project. Our primary use case is: we have several IRs across the state that have ETD collections for their respective institutions and we would like to create a single federated collection that aggregates those ETDs and keeps itself automatically updated.

Over the last several months TAMU Libraries have been following the development of the OAI Object Reuse and Exchange (ORE) specification. Its primary feature is the ORE Model, an abstract data model for expressing relationships between web resources. The term they use for such a relationship is “Aggregation” and the web resources they describe are called “Aggregated Resources”. A concrete representation of an Aggregation in some readable format is called a “Resource Map”.

Given potential of this new format for repository interoperability and the wide adoption of DSpace in the Texas Digital Library, we are currently looking at adding OAI-ORE support to DSpace. This would include the capability to both disseminate and harvest ORE objects. Our approach is as follows.

First we need a way to disseminate ORE resource maps from a DSpace repository. This is a two-stage problem. Step one is creating a mapping between the DSpace architecture and the ORE data model. In other words, generating ORE Resource maps from DSpace objects. Also necessary was a specific serialization format to express the abstract resource map as a concrete and usable representation. For this we chose Atom XML. Following the documentation and examples from the ORE specifications, specifically “Resource Map Implementation in Atom”, the mapping was fairly straightforward to create. I've attached an example copy of such a mapping.

The second part is actually disseminating the results; that is providing the resource maps with a URI from which they can be consistently accessed. Following the guidelines set forth in “Resource Map Discovery” guide, we chose to simply disseminate the ORE resource maps through existing OAI-PMH means. This entailed the creation of an ORE/Atom dissemination crosswalk in DSpace that served out ORE as one of the available metadata formats. In addition to OAI-PMH, the ORE resource maps is also provided by other services using the same crosswalk, for example Manakin XMLUI.

Done in this fashion the dissemination aspect of ORE support does not require any core changes to DSpace itself. The changes are limited to a new dissemination crosswalk as well as whatever changes Manakin and the OAI webapp needed to serve the resource maps directly through a URI in addition to an OAI-PMH request.

The harvesting aspect is more complex and requires a greater degree of changes to core DSpace. First and foremost, DSpace needed the ability to contact remote OAI-PMH providers and harvest data from them. This would make a DSpace repository not just a data provider, but potentially also a service provider under the OAI-PMH architecture. Creation of such a harvester, and imparting it with the ability for automatic iterative updates, entailed the addition of a new table to the DSpace database to relate collections with their harvesting information.

Once a basic OAI-PMH harvester was complete, it was extended to also function as an ORE harvester by the addition of the appropriate ingestion crosswalk. The OREIngestionCrosswalk can parse an ORE document, build a basic DSpace item and then fill in its metadata and bitstreams by following the URI references in the Resource Map and generating a secondary OAI-PMH request for descriptive metadata. Once all these pieces were completed, their capabilities needed to be made useable through DSpace’s interfaces, such as the command-line interface, the Manakin UI, JSPUI and so forth.

Furthermore, since we added capabilities to make the harvests automatic and recurring, this could allow us to keep a harvested DSpace collection “in sync” with an external DSpace collection it is being harvested from. This is the primary use case that motivated TDL to look into using ORE for interoperability between the many IRs under its umbrella. The potential, however, is even greater. Because ORE resource maps simply describe aggregations of web-accessible resources without regard to how that access is made possible, this could in theory allow for seamless interoperability between a DSpace IR and any other ORE-compliant service.



Feedback is definitely welcome, so please reply if you have comments, questions, or suggestions.

Alexey Maslov

<?xml version="1.0" encoding="UTF-8"?>

<!-- Creating a simple mapping from the DSpace data model into an atom feed -->
<atom:entry xmlns:atom="http://www.w3.org/2005/Atom"; 
    xmlns:ore="http://www.openarchives.org/ore/terms/"; 
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#";
    xmlns:oreatom="http://www.openarchives.org/ore/atom/";
    xmlns:dcterms="http://purl.org/dc/terms/";>
    
    <atom:id>http://repository.tamu.edu/dspace-oai/metadata/handle/1969.1/316/ore.xml</atom:id>
    <atom:link rel="alternate" type="text/html" href="http://handle.tamu.edu/1969.1/316"/>
    
    <!-- URI-A -->
    <atom:link rel="http://www.openarchives.org/ore/terms/describes"; href="http://repository.tamu.edu/dspace-oai/metadata/handle/1969.1/316/ore.xml"/>
    <!-- URI-R -->
    <atom:link rel="self" href="http://repository.tamu.edu/dspace-oai/metadata/handle/1969.1/316/ore.xml#atom"; type="application/atom+xml"/>
    
    <!-- Update time (now); generated on the fly -->
    <atom:published>2004-09-30T01:51:53Z</atom:published>
    <atom:updated>2004-09-30T01:51:53Z</atom:updated>
    
    <!-- Author/creator of the Resource Map, as distinct from the creator of the aggregation it describes --> 
    <atom:source>
        <atom:generator uri="http://repository.tamu.edu/dspace-oai";>Texas A&amp;M Repository OAI-PMH provider</atom:generator>
    </atom:source>
    
    <!-- Info about the aggregation (item) itself -->
    <atom:title>Modeling high-genus surfaces</atom:title>
    <atom:author>
        <atom:name>Srinivasan, Vinod</atom:name>
    </atom:author>
    <atom:category scheme="http://www.openarchives.org/ore/terms/"; term="http://www.openarchives.org/ore/terms/Aggregation"; label="Aggregation" />
    <atom:category scheme="http://www.openarchives.org/ore/atom/modified"; term="2006-01-18T06:16:15Z"/>
    <atom:category scheme="http://www.dspace.org/objectModel/"; term="DSpaceItem" label="DSpace Item"/>
    
    
    <!-- Aggregated Resources -->
    <atom:link rel="http://www.openarchives.org/ore/terms/aggregates"; 
        href="http://repository.tamu.edu/bitstream/handle/1969.1/316/etd-tamu-2004A-ARCH-Srinivasan-1.pdf?sequence=1";
        title="" type="application/pdf" />
    
    <!-- The extracted text of the PDF from the TEXT bundle. Might omit this, see Note #3. -->
    <atom:link rel="http://www.openarchives.org/ore/terms/aggregates";
        href="http://repository.tamu.edu/bitstream/handle/1969.1/316/etd-tamu-2004A-ARCH-Srinivasan-1.pdf.txt?sequence=2"; 
        title="" type="text/plain" />
    
    <!-- The MODS metadata. -->
    <atom:link rel="http://www.openarchives.org/ore/terms/aggregates";
        href="http://repository.tamu.edu/bitstream/handle/1969.1/316/MODS.xml?sequence=3"; 
        title="MODS metadata for this ETD" type="text/xml" />
    
    <!-- The METS metadata. -->
    <atom:link rel="http://www.openarchives.org/ore/terms/aggregates";
        href="http://repository.tamu.edu/bitstream/handle/1969.1/316/METS.xml?sequence=4"; 
        title="METS representation for this asset (all bitstreams and their relationships" type="text/xml" />
    
    <!-- Additional information about the individual resources -->
    <oreatom:triples>
        <rdf:Description rdf:about="http://repository.tamu.edu/dspace-oai/metadata/handle/1969.1/316/ore.xml";>
            <rdf:type rdf:resource="http://www.dspace.org/objectModel/DSpaceItem"/>
            <dcterms:modified>2006-01-18T06:16:15Z</dcterms:modified> 
        </rdf:Description>
        <rdf:Description rdf:about="http://repository.tamu.edu/bitstream/handle/1969.1/316/etd-tamu-2004A-ARCH-Srinivasan-1.pdf?sequence=1";>
            <rdf:type rdf:resource="http://www.dspace.org/objectModel/DSpaceBitstream"/>
            <dcterms:description>CONTENT</dcterms:description>
        </rdf:Description>
        <rdf:Description rdf:about="http://repository.tamu.edu/bitstream/handle/1969.1/316/etd-tamu-2004A-ARCH-Srinivasan-1.pdf.txt?sequence=2";>
            <rdf:type rdf:resource="http://www.dspace.org/objectModel/DSpaceBitstream"/>
            <dcterms:description>TEXT</dcterms:description>
        </rdf:Description>
        <rdf:Description rdf:about="http://repository.tamu.edu/bitstream/handle/1969.1/316/MODS.xml?sequence=3";>
            <rdf:type rdf:resource="http://www.dspace.org/objectModel/DSpaceBitstream"/>
            <dcterms:description>METADATA</dcterms:description>
        </rdf:Description>
        <rdf:Description rdf:about="http://repository.tamu.edu/bitstream/handle/1969.1/316/METS.xml?sequence=4";>
            <rdf:type rdf:resource="http://www.dspace.org/objectModel/DSpaceBitstream"/>
            <dcterms:description>METADATA</dcterms:description>
        </rdf:Description>
    </oreatom:triples>
    
</atom:entry>



------------------------------------------------------------------------------
SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
The future of the web can't happen without you.  Join us at MIX09 to help
pave the way to the Next Web now. Learn more and register at
http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/
_______________________________________________
Dspace-devel mailing list
Dspace-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-devel

Reply via email to