[CODE4LIB] vufind, ead files, harvesting content, and text mining

Eric Lease Morgan Fri, 29 Oct 2010 06:39:16 -0700

I have written a couple of blog postings as well as bunches o' hacks 
surrounding VUFind, EAD files, harvesting content, and text mining that may be 
of interest to us coders:


  1. EAD files - The first posting and set of Perl scripts describes how I am 
currently indexing MARC records, but more importantly, EAD files in VUFind. The 
process involves harvesting EAD files from remote locations, transforming them 
into HTML, indexing them at the container level, and providing access to the 
index. [1, 2]

  2. Internet Archive content - The second posting describes how I mirrored 
content from the Internet archive, munged the mirrored MARC records, indexed 
them, and provided a rudimentary text mining interface against the locally 
cached full text. [3, 4]

There are lots of cool (as well as "kewl") possibilities here.

[1] indexing EAD in VUFind - http://bit.ly/cIu0lG
[2] EAD record in VUFind - http://bit.ly/9Z7GUg
[3] Internet Archive content - http://bit.ly/dbzYyX
[4] harvested record with text mining - http://bit.ly/ahjLf2

-- 
Eric "@isitfriday" Morgan

[CODE4LIB] vufind, ead files, harvesting content, and text mining

Reply via email to