Yes, open library implemented it, and, of course, where it doesn't work
is where the data is pretty bad. If you prefer to err on the side of
merging, you can loosen the algorithm's weights. It's based on the
algorithm used for the U Cal union catalog, which was exercised over
about 20 years. Last I was able to ascertain, the data elements are very
similar to the ones used (at least at the time) by WorldCat.
kc
On 8/22/13 1:21 PM, Michael Beccaria wrote:
Karen,
Do you have a sense of how well it actually works? Is Open Library implementing
it?
Mike Beccaria
Systems Librarian
Head of Digital Initiative
Paul Smith's College
518.327.6376
[email protected]
Become a friend of Paul Smith's Library on Facebook today!
-----Original Message-----
From: Code for Libraries [mailto:[email protected]] On Behalf Of Karen
Coyle
Sent: Thursday, August 22, 2013 11:53 AM
To: [email protected]
Subject: Re: [CODE4LIB] De-dup MARC Ebook records
The record matching algorithm used by the Open Library is available here:
https://github.com/openlibrary/openlibrary/tree/master/openlibrary/catalog/merge
The original spec, which may have changed in the implementation, is here:
http://kcoyle.net/merge.html
kc
On 8/22/13 8:07 AM, Michael Beccaria wrote:
Steve,
I don't think it's so much find a control field (however, the closest match I
can use is ISBN or eISBN which has its issues) but also normalizing the data in
the fields so that matches are produced. It will no doubt take some time to
figure out.
Mike Beccaria
Systems Librarian
Head of Digital Initiative
Paul Smith's College
518.327.6376
[email protected]
Become a friend of Paul Smith's Library on Facebook today!
-----Original Message-----
From: Code for Libraries [mailto:[email protected]] On Behalf
Of McDonald, Stephen
Sent: Friday, August 16, 2013 8:16 AM
To: [email protected]
Subject: Re: [CODE4LIB] De-dup MARC Ebook records
Michael Beccaria said:
Thanks for the replies. To clarify, I am working with 2 (or more in
the future) marc records outside of the ILS. I've tried using
Marcedit but my usage did vary...not much overlap with the control
fields that were available to me. I have a feeling they are a bit
varied. I'm also messing around with marcXimiL a little but I'm
having trouble getting it to output any records at all. I also was
looking at the XC aggregation module but I was having trouble getting
that to work properly as well and the listserv was unresponsive. It
seemed like good software but it required me to set up an OAI harvest
source to allow it to ingest the records and that...well...enough is
enough... I think I will probably need to write something, and at
least that way I know what it will be doing rather than plowing
through software that has little to no support. Please feel free to let me know
of a particular strategy you think might work best in this regard...
If you couldn't get adequate deduping from the control fields available in
MarcEdit deduping, what control fields do you think you need to dedup on? You
can actually specify any arbitrary field and subfield for deduping in MarcEdit.
Steve McDonald
[email protected]
--
Karen Coyle
[email protected] http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet
--
Karen Coyle
[email protected] http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet