DSpace folks,

For some time now the Texas Digital Library has been investigating  
using ORE and OAI-PMH in conjunction with handling ETDs from various  
schools across Texas in a federated collection. Our primary use case  
still is: we have several IRs across the state that have ETD  
collections for their respective institutions and we would like to  
create a single federated collection that aggregates those ETDs and  
keeps itself automatically updated. To accomplish this, we have added  
the ability to point a DSpace collection to an external OAI-PMH  
provider and harvest its items into the local repository. If the  
remote repository supports OAI-ORE (for example, another DSpace  
instance), the resource maps can be used to harvest bitstreams as  
well. We also implemented a scheduling system to run harvests on  
configured collections at set intervals.

This update is to let you know that the bulk of the project has been  
completed and is currently undergoing testing. If you want to take a  
look, the SVN branch is available at:
https://source.tdl.org/svn/dspace/branches/dspace-1_5_0-with-harvesting/

We will be integrating the code into later versions of DSpace and  
would like for it to be considered for inclusion into future versions.


The basic install and use instructions are as follows.

1. Check out the harvesting branch at:
https://source.tdl.org/svn/dspace/branches/dspace-1_5_0-with-harvesting/

2. Follow the installation instructions in dspace/docs/install.html  
normally, with two exceptions:
   a) before running "mvn package" for the first time, you'll need to  
manually install a .jar into your maven repository. It is found in:
[dspace-source]/etc/oclc-harvester2-0.1.12.jar
The full command is:
mvn install:install-file -DgroupId=org.dspace -DartifactId=oclc- 
harvester2 -Dversion=0.1.12 -Dpackaging=jar -Dfile=[dspace-source]/etc/ 
oclc-harvester2-0.1.12.jar
   b) there are some new settings in dspace.cfg. The ones of immediate  
interest to you are "dspace.oai.url", which is the URL that ORE uses  
to assign its resources a permanent home and "harvester.eperson",  
which the EPerson under whose authorization the automatic harvests are  
performed. The rest of the configuration options are described in the  
configure.html documenation.

3. Harvesting settings are collection-specific and can be configured  
from JSPUI, XMLUI and command line.
   a) The command-line utility to configure and run harvests is  
currently executed via:
[dspace-source]/bin/dsrun org.dspace.app.harvest.Harvest
Use the -h flag for details.
   b) Both JSPUI and XMLUI support setting up a collection's harvest  
settings through its admin interface. In JSPUI, the harvest settings  
were added to the bottom of the Collection edit screen. In XMLUI, a  
new tab was added to Edit Collection and Control Panel screens.


-Alexey Maslov

------------------------------------------------------------------------------
This SF.net email is sponsored by:
High Quality Requirements in a Collaborative Environment.
Download a free trial of Rational Requirements Composer Now!
http://p.sf.net/sfu/www-ibm-com
_______________________________________________
Dspace-devel mailing list
Dspace-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-devel

Reply via email to