For a good time, I started playing with the HathiTrust Research Center (HTRC) 
and the Great Books Of The Western World.

The HTRC is the beginnings of an service providing a computable interface to 
some of the content of the HathiTrust. A couple of people from the Center came 
to visit Notre Dame a few weeks ago, and I blogged about the event as well as 
some of the functionality of the interface. [0]

In an effort to learn more about the Center's functionality, I began a mini 
project. Specifically, I have had a co-worker (Adam McGinn) create a public 
work set containing the Great Books Of The Western World. I then ran an HTRC 
algorithm against the set -- the MARC record dumper. After getting the 
MARC(XML) records I concatenated them into a single file in order to make 
processing easier. I then wrote a Perl script to read each record, extract 
rudimentary bibliographic and control information from the data, and output an 
alphabetical list of the Great Books complete with links to the HathiTrust 
catalog and full view displays. Data, script, and output are available at 
http://bit.ly/10Alu81

Some of the next steps are to download the raw digitized data and do analysis 
against it. Fun with modern librarianship?

[0] blog posting about the HTRC - http://dh.crc.nd.edu/blog/2013/05/htrc/

--
Eric Lease Morgan, Digital Initiatives Librarian
University of Notre Dame

Reply via email to