The Texas Digital Library has been investigating using ORE and OAI-PMH
in conjunction with handling ETDs from various schools across Texas in
a federated collection. I would like to briefly cover the background
and design that we are looking at to see if anyone has any feedback or
comments on this project. Our primary use case is: we have several IRs
across the state that have ETD collections for their respective
institutions and we would like to create a single federated collection
that aggregates those ETDs and keeps itself automatically updated.
Over the last several months TAMU Libraries have been following the
development of the OAI Object Reuse and Exchange (ORE) specification.
Its primary feature is the ORE Model, an abstract data model for
expressing relationships between web resources. The term they use for
such a relationship is “Aggregation” and the web resources they
describe are called “Aggregated Resources”. A concrete representation
of an Aggregation in some readable format is called a “Resource Map”.
Given potential of this new format for repository interoperability and
the wide adoption of DSpace in the Texas Digital Library, we are
currently looking at adding OAI-ORE support to DSpace. This would
include the capability to both disseminate and harvest ORE objects.
Our approach is as follows.
First we need a way to disseminate ORE resource maps from a DSpace
repository. This is a two-stage problem. Step one is creating a
mapping between the DSpace architecture and the ORE data model. In
other words, generating ORE Resource maps from DSpace objects. Also
necessary was a specific serialization format to express the abstract
resource map as a concrete and usable representation. For this we
chose Atom XML. Following the documentation and examples from the ORE
specifications, specifically “Resource Map Implementation in Atom”,
the mapping was fairly straightforward to create. I've attached an
example copy of such a mapping.
The second part is actually disseminating the results; that is
providing the resource maps with a URI from which they can be
consistently accessed. Following the guidelines set forth in “Resource
Map Discovery” guide, we chose to simply disseminate the ORE resource
maps through existing OAI-PMH means. This entailed the creation of an
ORE/Atom dissemination crosswalk in DSpace that served out ORE as one
of the available metadata formats. In addition to OAI-PMH, the ORE
resource maps is also provided by other services using the same
crosswalk, for example Manakin XMLUI.
Done in this fashion the dissemination aspect of ORE support does not
require any core changes to DSpace itself. The changes are limited to
a new dissemination crosswalk as well as whatever changes Manakin and
the OAI webapp needed to serve the resource maps directly through a
URI in addition to an OAI-PMH request.
The harvesting aspect is more complex and requires a greater degree of
changes to core DSpace. First and foremost, DSpace needed the ability
to contact remote OAI-PMH providers and harvest data from them. This
would make a DSpace repository not just a data provider, but
potentially also a service provider under the OAI-PMH architecture.
Creation of such a harvester, and imparting it with the ability for
automatic iterative updates, entailed the addition of a new table to
the DSpace database to relate collections with their harvesting
information.
Once a basic OAI-PMH harvester was complete, it was extended to also
function as an ORE harvester by the addition of the appropriate
ingestion crosswalk. The OREIngestionCrosswalk can parse an ORE
document, build a basic DSpace item and then fill in its metadata and
bitstreams by following the URI references in the Resource Map and
generating a secondary OAI-PMH request for descriptive metadata. Once
all these pieces were completed, their capabilities needed to be made
useable through DSpace’s interfaces, such as the command-line
interface, the Manakin UI, JSPUI and so forth.
Furthermore, since we added capabilities to make the harvests
automatic and recurring, this could allow us to keep a harvested
DSpace collection “in sync” with an external DSpace collection it is
being harvested from. This is the primary use case that motivated TDL
to look into using ORE for interoperability between the many IRs under
its umbrella. The potential, however, is even greater. Because ORE
resource maps simply describe aggregations of web-accessible resources
without regard to how that access is made possible, this could in
theory allow for seamless interoperability between a DSpace IR and any
other ORE-compliant service.
Feedback is definitely welcome, so please reply if you have comments,
questions, or suggestions.
Alexey Maslov
<?xml version="1.0" encoding="UTF-8"?>
<!-- Creating a simple mapping from the DSpace data model into an atom feed -->
<atom:entry xmlns:atom="http://www.w3.org/2005/Atom"
xmlns:ore="http://www.openarchives.org/ore/terms/"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:oreatom="http://www.openarchives.org/ore/atom/"
xmlns:dcterms="http://purl.org/dc/terms/">
<atom:id>http://repository.tamu.edu/dspace-oai/metadata/handle/1969.1/316/ore.xml</atom:id>
<atom:link rel="alternate" type="text/html" href="http://handle.tamu.edu/1969.1/316"/>
<!-- URI-A -->
<atom:link rel="http://www.openarchives.org/ore/terms/describes" href="http://repository.tamu.edu/dspace-oai/metadata/handle/1969.1/316/ore.xml"/>
<!-- URI-R -->
<atom:link rel="self" href="http://repository.tamu.edu/dspace-oai/metadata/handle/1969.1/316/ore.xml#atom" type="application/atom+xml"/>
<!-- Update time (now); generated on the fly -->
<atom:published>2004-09-30T01:51:53Z</atom:published>
<atom:updated>2004-09-30T01:51:53Z</atom:updated>
<!-- Author/creator of the Resource Map, as distinct from the creator of the aggregation it describes -->
<atom:source>
<atom:generator uri="http://repository.tamu.edu/dspace-oai">Texas A&M Repository OAI-PMH provider</atom:generator>
</atom:source>
<!-- Info about the aggregation (item) itself -->
<atom:title>Modeling high-genus surfaces</atom:title>
<atom:author>
<atom:name>Srinivasan, Vinod</atom:name>
</atom:author>
<atom:category scheme="http://www.openarchives.org/ore/terms/" term="http://www.openarchives.org/ore/terms/Aggregation" label="Aggregation" />
<atom:category scheme="http://www.openarchives.org/ore/atom/modified" term="2006-01-18T06:16:15Z"/>
<atom:category scheme="http://www.dspace.org/objectModel/" term="DSpaceItem" label="DSpace Item"/>
<!-- Aggregated Resources -->
<atom:link rel="http://www.openarchives.org/ore/terms/aggregates"
href="http://repository.tamu.edu/bitstream/handle/1969.1/316/etd-tamu-2004A-ARCH-Srinivasan-1.pdf?sequence=1"
title="" type="application/pdf" />
<!-- The extracted text of the PDF from the TEXT bundle. Might omit this, see Note #3. -->
<atom:link rel="http://www.openarchives.org/ore/terms/aggregates"
href="http://repository.tamu.edu/bitstream/handle/1969.1/316/etd-tamu-2004A-ARCH-Srinivasan-1.pdf.txt?sequence=2"
title="" type="text/plain" />
<!-- The MODS metadata. -->
<atom:link rel="http://www.openarchives.org/ore/terms/aggregates"
href="http://repository.tamu.edu/bitstream/handle/1969.1/316/MODS.xml?sequence=3"
title="MODS metadata for this ETD" type="text/xml" />
<!-- The METS metadata. -->
<atom:link rel="http://www.openarchives.org/ore/terms/aggregates"
href="http://repository.tamu.edu/bitstream/handle/1969.1/316/METS.xml?sequence=4"
title="METS representation for this asset (all bitstreams and their relationships" type="text/xml" />
<!-- Additional information about the individual resources -->
<oreatom:triples>
<rdf:Description rdf:about="http://repository.tamu.edu/dspace-oai/metadata/handle/1969.1/316/ore.xml">
<rdf:type rdf:resource="http://www.dspace.org/objectModel/DSpaceItem"/>
<dcterms:modified>2006-01-18T06:16:15Z</dcterms:modified>
</rdf:Description>
<rdf:Description rdf:about="http://repository.tamu.edu/bitstream/handle/1969.1/316/etd-tamu-2004A-ARCH-Srinivasan-1.pdf?sequence=1">
<rdf:type rdf:resource="http://www.dspace.org/objectModel/DSpaceBitstream"/>
<dcterms:description>CONTENT</dcterms:description>
</rdf:Description>
<rdf:Description rdf:about="http://repository.tamu.edu/bitstream/handle/1969.1/316/etd-tamu-2004A-ARCH-Srinivasan-1.pdf.txt?sequence=2">
<rdf:type rdf:resource="http://www.dspace.org/objectModel/DSpaceBitstream"/>
<dcterms:description>TEXT</dcterms:description>
</rdf:Description>
<rdf:Description rdf:about="http://repository.tamu.edu/bitstream/handle/1969.1/316/MODS.xml?sequence=3">
<rdf:type rdf:resource="http://www.dspace.org/objectModel/DSpaceBitstream"/>
<dcterms:description>METADATA</dcterms:description>
</rdf:Description>
<rdf:Description rdf:about="http://repository.tamu.edu/bitstream/handle/1969.1/316/METS.xml?sequence=4">
<rdf:type rdf:resource="http://www.dspace.org/objectModel/DSpaceBitstream"/>
<dcterms:description>METADATA</dcterms:description>
</rdf:Description>
</oreatom:triples>
</atom:entry>
------------------------------------------------------------------------------
SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
The future of the web can't happen without you. Join us at MIX09 to help
pave the way to the Next Web now. Learn more and register at
http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/
_______________________________________________
Dspace-devel mailing list
Dspace-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-devel