https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=10662
--- Comment #314 from David Cook <[email protected]> --- Fortunately, since we have RabbitMQ now, some of it should be a lot easier! (I was just looking at bug 27421 which asynchronously stages, imports, and reverts MARC imports. It could be helpful for this work too.) *The hard part for the OAI-PMH harvester in Koha is the task scheduling.* Koha doesn't have any way to let users define their own task schedules. (Back in 2018, Frido also mentioned Koha support companies might not want to let librarians set task schedules for OAI-PMH anyway for performance/API rate limiting reasons.) That said, if librarians controlling the scheduling doesn't matter for you, you could just create a cronjob that sends OAI-PMH tasks to RabbitMQ (or a plugin that uses the nightly plugin cronjob). Then all that's left is to create a Koha::BackgroundJob::OAIPMHHarvest class. -- The background job class could probably encapsulate the entire task. (For bug 10662, the requirement was to download records every 3 seconds, so I had to split the harvest/download and import tasks into two separate asynchronous tasks to achieve fast enough download speeds.) For bug 10662, I also had a requirement to handle very long XML streams over HTTP rather than the usual short XML responses, which technically is allowed according to the OAI-PMH specification, and that meant a custom downloader. It was high performance but it meant I had to add even more code. In theory, you might be able to have Koha::BackgroundJob::OAIPMH::Download, Koha::BackgroundJob::OAIPMH::Stage and Koha::BackgroundJob::OAIPMH::Import classes. The scheduler (e.g. cronjob) could enqueue a Koha::BackgroundJob::OAIPMH::Download task which downloads the records, that could then enqueue a Koha::BackgroundJob::OAIPMH::Stage task to stage (ie run the matcher/duplicate finder and ideally do some OAI-PMH specific checks), and that could enqueue the final Koha::BackgroundJob::OAIPMH::Import task to run the actual import. (The advantage of breaking it into 3 different tasks is that Koha by default only has 1 background job worker, so very long tasks could prevent other tasks from running in a timely way.) (However, if we had more than 1 background job worker, I'd be a little concerned about race conditions where Worker B tries to import Record 1-A after Worker A has imported Record 1-B where Record 1-A is older than Record 1-B. There needs to be a sanity check to make sure that records only overwrite older records.) Depending on how bug 27421 works, Koha::BackgroundJob::OAIPMH::Stage and Koha::BackgroundJob::OAIPMH::Import could potentially be subclasses of Koha::BackgroundJob::StageMARCForImport and Koha::BackgroundJob::StageMARCForImport. Although I don't really like Koha's built-in MARC import classes for OAI-PMH, because once records are staged they're imported without any sanity checks. Also record matching rules are user-controllable and solely MARC based so they're unreliable and not great for matching incoming OAI-PMH records to past harvested records. -- You are receiving this mail because: You are watching all bug changes. _______________________________________________ Koha-bugs mailing list [email protected] https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs website : http://www.koha-community.org/ git : http://git.koha-community.org/ bugs : http://bugs.koha-community.org/
