Eric, I'm planning a similar but not identical bit of work based on the
Open Library. In my case, we are hoping to produce MARC records that
libraries can include in their own catalogs for open access books.
Because this is aimed at public libraries, we are looking for "popular"
reading, as well as "great" reading.
The HathiTrust books, while publicly viewable, are often not
downloadable without a partner log-in. Those exact same books are often
available on the Internet Archive for public download and in a variety
of e-book formats.
kc
On 6/5/13 6:50 AM, Eric Lease Morgan wrote:
For a good time, I started playing with the HathiTrust Research Center (HTRC)
and the Great Books Of The Western World.
The HTRC is the beginnings of an service providing a computable interface to
some of the content of the HathiTrust. A couple of people from the Center came
to visit Notre Dame a few weeks ago, and I blogged about the event as well as
some of the functionality of the interface. [0]
In an effort to learn more about the Center's functionality, I began a mini
project. Specifically, I have had a co-worker (Adam McGinn) create a public
work set containing the Great Books Of The Western World. I then ran an HTRC
algorithm against the set -- the MARC record dumper. After getting the
MARC(XML) records I concatenated them into a single file in order to make
processing easier. I then wrote a Perl script to read each record, extract
rudimentary bibliographic and control information from the data, and output an
alphabetical list of the Great Books complete with links to the HathiTrust
catalog and full view displays. Data, script, and output are available at
http://bit.ly/10Alu81
Some of the next steps are to download the raw digitized data and do analysis
against it. Fun with modern librarianship?
[0] blog posting about the HTRC - http://dh.crc.nd.edu/blog/2013/05/htrc/
--
Eric Lease Morgan, Digital Initiatives Librarian
University of Notre Dame
--
Karen Coyle
kco...@kcoyle.net http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet