http://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=10662
--- Comment #44 from Andreas Hedström Mace <[email protected]> --- (In reply to David Cook from comment #41) > I agree once again with Katrin. I think I've said before (either here or via > email) that using Zebra for matching can be very unreliable. > > Currently, I use the unique OAI-PMH identifiers to handle all harvested > records, and that's quite robust, since that identifier should be > persistent. However, that obviously doesn't help with matching OAI-PMH > harvested records against local records created via other methods. That potential matching rules should not rely on Zebra I think we can all agree on. But only matching via OAI-PMH identifiers I do not think would work for harvests from union catalogs. If I understand it correctly, it would force all libraries who want to start using Koha with OAI-PMH harvests to migrate using OAI-PMH or having a duplicate of all its records (since none of the local records will have OAI-PMH identifiers). In our case that would be about 1,2 million duplicates. > In the short-term, perhaps merging bibliographic records would have to occur > manually. Or maybe a deduplication tool could be created to semi-automate > that task... although I think that tool would have to prevent any deletion > of OAI-PMH harvested records. Handling 1,2 million duplicates manually, or even semi-automatically will most likely not be possible. Although it might be difficult technically, I still think some sort of matching rules is necessary. > Actually, this hearkens back to my previous comment. It would be good if > each record had a simple way of identifying its origin. So you couldn't > delete a record obtained via OAI-PMH unless its parent repository was > deleted from Koha or unless you used a OAI-PMH management tool to delete > records for that repository. > > I think providing this "source" or "origin" would need to be done > consistently or rather... extensibly. I wouldn't want it to be OAI-PMH > specific as that would be short-sighted. Marking the source/origin of a record sounds like a good idea to me, if it can be easily incorporated. > At the moment, everything that goes through svc/import_bib uses a > webservices import_batch... but that's not very unique. It would be > interesting to have unique identifiers for import sources. So you might use > the svc/import_bib with the connexion_import_daemon.pl, or with MARCEdit, or > your home-grown script, or whatever. It would be interesting to distinguish > those separately... and maybe prevent writes/deletions for records that are > entered via connexion_import_daemon.pl and home-grown script XYZ, while > leaving ones imported via MARCEdit to be managed however since you just > exported some original records and re-imported them via MARCEdit after > making some changes. I was going to ask how this would tie in with the development of the REST API, but Davids comment #42 explains that. -- You are receiving this mail because: You are watching all bug changes. _______________________________________________ Koha-bugs mailing list [email protected] http://lists.koha-community.org/cgi-bin/mailman/listinfo/koha-bugs website : http://www.koha-community.org/ git : http://git.koha-community.org/ bugs : http://bugs.koha-community.org/
