Re: [CODE4LIB] internet archive api

Kim Pham Tue, 19 Sep 2017 05:52:26 -0700

Hi Eric,

We also have a series of scripts that we use with the Internet Archive API: 
https://github.com/digitalutsc/internetarchive_scripts. Namely, to watch an IA 
collection, download new items and process them based on a table of contents.


Kim Pham
Digital Projects & Technologies Librarian | Liaison Librarian, Physical & 
Environmental Sciences (Physics)

UNIVERSITY OF TORONTO SCARBOROUGH
AC 270 | 1265 Military Trail, Toronto, Ontario, M1C 1A4
https://utsc.library.utoronto.ca/

________________________________________
From: Code for Libraries [[email protected]] on behalf of Eric Lease 
Morgan [[email protected]]
Sent: September-18-17 3:37 PM
To: [email protected]
Subject: [CODE4LIB] internet archive api

Is there an Internet Archive API that will allow me to get the contents of a 
collection as a stream of data and not as a stream of HTML.

A cool collection of early English print materials is available at the 
following URL:

  https://archive.org/details/bplsceep

Each item is associated with an Internet Archive identifier. If I were able to 
easily extract these identifiers, then I would be more easily able to provide 
services based on the collection. But I’m lazy. I don’t want to read the HTML 
and scrape it accordingly. Ick! I’d rather be given the list of bibliographics 
in a more computer-friendly way.

Again, can I programmatically read the contents of a Internet Archive 
collection?

—
Eric Morgan

Re: [CODE4LIB] internet archive api

Reply via email to