Hi there, 

There’s a python library that I’ve had good luck with 
(https://internetarchive.readthedocs.io/en/latest/index.html ). It comes with a 
command line tool, ia 
(https://internetarchive.readthedocs.io/en/latest/cli.html ), that will easily 
do what you’d like. 

“ ia search ‘collection:bplsceep’ “ will return the list of Internet Archive 
identifiers for that collection. 

Best,
Katie


-- 
 
Katie Mika
Biodiversity Heritage Library NDSR Resident
Ernst Mayr Library, Museum of Comparative Zoology, Harvard University
km...@fas.harvard.edu | 281-384-5789
 


On 9/18/17, 3:37 PM, "Code for Libraries on behalf of Eric Lease Morgan" 
<CODE4LIB@LISTS.CLIR.ORG on behalf of emor...@nd.edu> wrote:

    Is there an Internet Archive API that will allow me to get the contents of 
a collection as a stream of data and not as a stream of HTML.
    
    A cool collection of early English print materials is available at the 
following URL:
    
      https://archive.org/details/bplsceep
    
    Each item is associated with an Internet Archive identifier. If I were able 
to easily extract these identifiers, then I would be more easily able to 
provide services based on the collection. But I’m lazy. I don’t want to read 
the HTML and scrape it accordingly. Ick! I’d rather be given the list of 
bibliographics in a more computer-friendly way.
    
    Again, can I programmatically read the contents of a Internet Archive 
collection?
    
    —
    Eric Morgan
    

Reply via email to