Eric, I'm planning a similar but not identical bit of work based on the Open Library. In my case, we are hoping to produce MARC records that libraries can include in their own catalogs for open access books. Because this is aimed at public libraries, we are looking for "popular" reading, as well as "great" reading.

The HathiTrust books, while publicly viewable, are often not downloadable without a partner log-in. Those exact same books are often available on the Internet Archive for public download and in a variety of e-book formats.


On 6/5/13 6:50 AM, Eric Lease Morgan wrote:
For a good time, I started playing with the HathiTrust Research Center (HTRC) 
and the Great Books Of The Western World.

The HTRC is the beginnings of an service providing a computable interface to 
some of the content of the HathiTrust. A couple of people from the Center came 
to visit Notre Dame a few weeks ago, and I blogged about the event as well as 
some of the functionality of the interface. [0]

In an effort to learn more about the Center's functionality, I began a mini 
project. Specifically, I have had a co-worker (Adam McGinn) create a public 
work set containing the Great Books Of The Western World. I then ran an HTRC 
algorithm against the set -- the MARC record dumper. After getting the 
MARC(XML) records I concatenated them into a single file in order to make 
processing easier. I then wrote a Perl script to read each record, extract 
rudimentary bibliographic and control information from the data, and output an 
alphabetical list of the Great Books complete with links to the HathiTrust 
catalog and full view displays. Data, script, and output are available at

Some of the next steps are to download the raw digitized data and do analysis 
against it. Fun with modern librarianship?

[0] blog posting about the HTRC -

Eric Lease Morgan, Digital Initiatives Librarian
University of Notre Dame

Karen Coyle
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet

Reply via email to