Greetings everyone,

I have recently come across a problem with harvesting metadata from sites
running the open-source software Geonetwork. In the past, I have been able
to check a OAI provider by using their baseURL and querying
ListMetadataFormats such as:
http://www.fao.org/geonetwork/srv/en/main.home/oaipmh?verb=ListMetadataFormats

Verifying that oai_dc is available, I have had little problem from other
sites gaining the metadata from these OAI providers. However, with
Geonetwork sites, I get an error message from my system along the lines of:

The OAI server does not support this metadata format:
http://www.openarchives.org/OAI/2.0/oai_dc/

Exception:
org.dspace.harvest.OAIHarvester$HarvestingException: The OAI server does
not support this metadata format:
http://www.openarchives.org/OAI/2.0/oai_dc/
       at org.dspace.harvest.OAIHarvester.runHarvest(OAIHarvester.java:278)

However, if you look at the query for metadata formats above, you will
notice that oai_dc is listed as a potential metadata format of the site. I
have noticed that there is one obvious difference between Geonetwork sites,
and the ones that I have had success in harvesting from, which is the
namespaces do not match in a query to ListMetadataFormats for oai_dc.

Here is the oai_dc metadata listed from a registered OAI data provider:
<metadataFormat>
<metadataPrefix>oai_dc</metadataPrefix>
<schema>http://www.openarchives.org/OAI/2.0/oai_dc.xsd</schema>
<metadataNamespace>http://www.openarchives.org/OAI/2.0/oai_dc/
</metadataNamespace>
</metadataFormat>

And here is the oai_dc metadata listed from a Geonetwork site:
<metadataFormat>
<metadataPrefix>oai_dc</metadataPrefix>
<schema>http://www.openarchives.org/OAI/2.0/oai_dc.xsd</schema>
<metadataNamespace>http://www.openarchives.org/OAI/2.0/</metadataNamespace>
</metadataFormat>

Could this namespace issue be the cause of the problems when trying to
harvest from Geonetwork sites? When I do a query on Records using the
metadata_prefix="oai_dc", I find that the records themselves are being
printed out as Dublin Core elements, but I'm not certain why DSpace seems
to think that it can't harvest from these sites? Am I missing something
fundamental about this problem? Has anyone ever run into this kind of
problem?

Any help that you can provide would be greatly appreciated!

Thanks,

Bob Torgerson
------------------------------------------------------------------------------
Virtualization & Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing 
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
_______________________________________________
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech

Reply via email to