https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=10662
--- Comment #291 from David Cook <[email protected]> --- On one hand, I feel like we're so close with the current patches. It just needs more unit tests. On the other hand, the unit tests for the task scheduling and concurrent processing are actually quite challenging. Also, these patches are a huge chunk of functionality, which increase Koha's overall size. I am tempted to take this code and split it into two parts: 1. Koha plugin for Web functionality 2. Standalone OAI-PMH harvester My thinking is the Koha plugin will allow you to connect to a separately packaged OAI-PMH Harvester API in order to add/start/stop/update/remove harvesting tasks. Easy! The standalone OAI-PMH harvester will then take care of the scheduling of tasks and high-performance downloading of records. The OAI-PMH harvester will then have actions to take on downloaded record batches. Not too hard! This is where things get interesting. Ideally, there would be a Koha API backed by a queue and Koha worker(s) to handle the processing of records. But that doesn't currently exist. I can use the Koha plugin to inject an API route, but there is no existing queue mechanism. Uh oh! The API could be used to store the records in a database table, but then I would need a Koha-based worker to access that database table and apply all the Koha-specific rules to the data. At this point it would be nice to have RabbitMQ for the queue, and then the Koha plugin could provide a Koha worker I suppose, which a sysadmin could manually start. I suppose we don't have to have RabbitMQ. The Koha worker could just tap into the database directly (until RabbitMQ is available). So in the end I suppose really 3 parts: 1. Koha plugin (Web functionality) 2. Koha plugin (Import Worker functionality) 3. Standalone OAI-PMH harvester Alternatively, the import API could handle all the Koha-related processing. I could do some tests to see how fast the web API could process the data. The OAI-PMH download will probably always be faster than the upload, so the OAI-PMH harvester would need to have its own internal queue for the downloaded records, but that could keep the Koha side of things slimmer. Plus... if Koha did implement a message queue like RabbitMQ, that change could be done transparently in the Koha background without affecting the OAI-PMH harvester. Ok so... 1. Koha plugin (Web UI to interact with OAI-PMH harvester) 2. Koha plugin (Import API to receive harvested OAI-PMH records) 3. Standalone OAI-PMH harvester I think that this makes sense. People could use the plugin, and then if they're liking it, then we could try again to get it into Koha master. I am actually interested in rewriting the OAI-PMH harvester in Golang to take advantage of its concurrent programming strengths. By using the Koha plugin to provide/consume APIs, we're able to use the best tools for the job for the actual OAI-PMH work. Note too that the OAI-PMH harvesting itself isn't actually Koha-specific. The only Koha-specific aspects are the scheduling and the record import. There's no real need to have the OAI-PMH code in the Koha codebase. -- You are receiving this mail because: You are watching all bug changes. _______________________________________________ Koha-bugs mailing list [email protected] https://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs website : http://www.koha-community.org/ git : http://git.koha-community.org/ bugs : http://bugs.koha-community.org/
