The record matching algorithm used by the Open Library is available here:
https://github.com/openlibrary/openlibrary/tree/master/openlibrary/catalog/merge

The original spec, which may have changed in the implementation, is here:

http://kcoyle.net/merge.html

kc


On 8/22/13 8:07 AM, Michael Beccaria wrote:
Steve,
I don't think it's so much find a control field (however, the closest match I 
can use is ISBN or eISBN which has its issues) but also normalizing the data in 
the fields so that matches are produced. It will no doubt take some time to 
figure out.

Mike Beccaria
Systems Librarian
Head of Digital Initiative
Paul Smith's College
518.327.6376
[email protected]
Become a friend of Paul Smith's Library on Facebook today!


-----Original Message-----
From: Code for Libraries [mailto:[email protected]] On Behalf Of 
McDonald, Stephen
Sent: Friday, August 16, 2013 8:16 AM
To: [email protected]
Subject: Re: [CODE4LIB] De-dup MARC Ebook records

Michael Beccaria said:
Thanks for the replies. To clarify, I am working with 2 (or more in
the future) marc records outside of the ILS. I've tried using Marcedit
but my usage did vary...not much overlap with the control fields that
were available to me. I have a feeling they are a bit varied. I'm also
messing around with marcXimiL a little but I'm having trouble getting
it to output any records at all. I also was looking at the XC
aggregation module but I was having trouble getting that to work
properly as well and the listserv was unresponsive. It seemed like
good software but it required me to set up an OAI harvest source to
allow it to ingest the records and that...well...enough is enough... I
think I will probably need to write something, and at least that way I
know what it will be doing rather than plowing through software that
has little to no support. Please feel free to let me know of a particular 
strategy you think might work best in this regard...
If you couldn't get adequate deduping from the control fields available in 
MarcEdit deduping, what control fields do you think you need to dedup on?  You 
can actually specify any arbitrary field and subfield for deduping in MarcEdit.

                                        Steve McDonald
                                        [email protected]

--
Karen Coyle
[email protected] http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet

Reply via email to