https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=10662
--- Comment #319 from David Cook <[email protected]> --- (In reply to Koha Team University Lyon 3 from comment #318) > We would like to use an existing OAI-PMH harvester and 2 of them seem to be > good candidates : > - Catmandu harverster > - HTTP::OAI::HARVESTER module > > Both are in perl. At the moment, we thought that the second could be a > better choice because it's more up to date and we don't necessarily need to > use Catmandu. Currently, we're still stuck using HTTP::OAI 3.27 from 2011 in Koha for OAI-PMH server functionality. Bug 17704 is looking at trying to get a later version of HTTP::OAI working with Koha, but it's been open for about 6 years now. (HTTP::OAI was also a dead project for a few years but it was resurrected in 2017 by one of the Catmandu authors. On that note, I think it is a good idea to avoid Catmandu.) One problem with HTTP::OAI that I encountered back in 2016 was that it needed to parse the entire XML response into a DOM Document tree rather than processing the XML response while it parsed it. This usually isn't a problem because most repositories use resumptionToken elements and limit responses to approximately 100 records. But LIBRIS in Sweden would stream the entire response back without resumptionToken elements, so 1 XML response could contain the entire catalogue's worth of records. That said, in theory the HTTP::OAI module uses event-driven SAX XML parsing, so it shouldn't be building a DOM Document tree from the response. Maybe the dev environment I was using in 2016 didn't have the correct SAX parser dependencies, so it was using a DOM-based parser in lieu of the SAX parser unintentionally. Plus, I suppose we could say that the Koha OAI-PMH harvester doesn't support OAI-PMH repositories that don't use resumptionToken elements for flow control. Or, since HTTP::OAI is no longer dead, that issue can always be pursued with the current maintainer. So overall... HTTP::OAI is probably the way to go. Just wanted to add a warning about my past experience with it. > What we would like is to use all the import tools already existing in Koha > (XSLT, Record matching rules, MARC modification templates, Stage marc for > import, Manage MARC overlay rules). > > We would like to add a OAI-PMH setting (like Z39-50 / SRU) in the staff > interface with URL, SET, XML Format, authentication login, biblio/authority > records, deleted records handling, email for logs, XSLT file, encoding, > items handling, profile import. Sounds like a plan. I suspect it will involve a lot of testing. It might be worthwhile to break some of that functionality out into separate tickets, so that the whole patch set doesn't need to be re-tested for minor fixes outside the core harvester functionality. > Every harvesting would be scheduled only via the cronjobs. That should make it easy to implement and test. -- You are receiving this mail because: You are watching all bug changes. _______________________________________________ Koha-bugs mailing list [email protected] https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs website : http://www.koha-community.org/ git : http://git.koha-community.org/ bugs : http://bugs.koha-community.org/
