Hi Garry,

this is a really interesting use case.

There are several ways to approach this, but first you should clarify
your requirements. Do you require real-time search or is harvesting
enough? By real-time search I mean that after an item is updated in
the source repository, any search on it immediately after that will
reflect the updated metadata. By harvesting I mean that the changes
will be reflected only after the next harvest.

The most staightforward solution would seem to be to set up a separate
DSpace repository that harvests the other repositories. Unfortunately,
it seems to me that the built-in OAI-PMH harvester in DSpace supports
only harvesting individual collections, i.e. not the whole repository.
I guess that's because OAI-PMH doesn't recognize the notion of
community/collection hierarchy. If you choose to go this way, you
would have to write your own harvester and make it import items to
DSpace (there are several ways how to do that).

Hilton suggested VuFind. I have some exprience with that and at this
time, I wouldn't recommend that solution because importing content via
XSL is slow (it took me about 30 minutes per 10 000 items), not
because of the harvesting itself (which is blazingly fast in DSpace
3.x), but because the importer in VuFind decides to chop the results
into individual items and apply the XSL transformation on each item
individually. OTOH, import of MARC21 to VuFind is very fast and we
might get a MARC21 exporter in DSpace 4.0 - it's still an open
question.

Those solutions leveraged harvesting. Now for your real-time search options:

You could use Solr in DSpace directly and either use a federated
solution like MetaLib (commercial) or write your own (shouldn't be
really difficult, but you will have to take care of ranking the merged
result set). Searching speed will depend on the speed of your slowest
DSpace source.

REST API - I would discourage you from going this route at this time
for two reasons:
* It's not officially part of DSpace yet and may change in the future.
* It would be slower than searching Solr directly (at best, it would
be a wrapper on top of Solr)

Please, clarify your requirements and feel free to ask for any details.

Regards,
~~helix84

Compulsory reading: DSpace Mailing List Etiquette
https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette

------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech
List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette

Reply via email to