[CODE4LIB] viaf and the levenshtein algorithm

Eric Lease Morgan Tue, 07 Jun 2016 02:49:55 -0700

In the past few weeks I have had some interesting experiences with WorldCat, 
VIAF, and the Levenshtein algorithm. [1, 2]


In short, I was given a set of authority records with the goal of associating 
each name with a VIAF identifier. To accomplish this goal I first created a 
rudimentary database — an easily parsed list of MARC 1xx fields. I then looped 
through the database, and searched VIAF via the AutoSuggest interface looking 
for one-to-one matches. If found, I updated my database with the VIAF 
identifier. The AutoSuggest interface was fast but only able to associate 20% 
of my names with identifiers. (Moreover, I don’t know how it works; AutoSuggest 
is a “black box” technology.)

I then looped through the database again, but this time I queried VIAF using 
the SRU interface. Searches often returned many hits, not just one-to-one 
matches, but through the use of the Levenshtein algorithm I was able to 
intelligently select items from the search results and update my database 
accordingly. [3] Through the use of the SRU/Levenshtein combination, I was able 
to associate another 50-55 percent of my names with identifiers.

Now that I have close to 75% of my names associated with VIAF identifiers, I 
can update my authority list’s MARC 024 fields, in turn, I can then provide 
enhanced services against my catalog as well as pave the way for linked data 
implementations.

Sometimes our library automation tasks can use a bit more computer science. 
Librarianship isn’t all about service and the humanities. Librarianship is an 
arscient discipline. [4]

[1] VIAF Finder - http://infomotions.com/blog/2016/05/viaf-finder/
[2] Almost perfection - http://infomotions.com/blog/2016/06/levenshtein/
[3] Levenshtein - https://en.wikipedia.org/wiki/Levenshtein_distance
[4] arscience - http://infomotions.com/blog/2008/07/arscience/

—
Eric Lease Morgan

[CODE4LIB] viaf and the levenshtein algorithm

Reply via email to