There's a Very Large Catalog that includes the holdings of many libraries. Links from that VLC into our OPAC (aka Walter) assume that our MARCs include the relevant VLC #s (or sometimes an ISBN) which they don't always do (often because many of our VLC numbers appear to be obsolete). So, when Walter can't find anything for a given VLC# he screen scrapes the VLC to find out the title and author (if there is one) so that he can then send you into the OPAC with a keyword search that might work.
I've been keeping a log of titles that have required this intervention and am happy to report that it works well. When it doesn't we now have the information we need to follow-up and remove our holdings from the Very Large Catalog. Guy <http://www.drew.edu/?utm_source=FIL_Email_Footer&utm_medium=email&utm_campaign=FIL%2BEmail%2BFooter> *Guy Dobson <http://www.drew.edu/directory/?q=email:gdobson&utm_source=FIL_Email_Footer&utm_medium=email&utm_campaign=FIL%2BEmail%2BFooter>* Systems Librarian | Library <http://www.drew.edu/library?utm_source=FIL_Email_Footer&utm_medium=email&utm_campaign=FIL%2BEmail%2BFooter> Drew University | 36 Madison Ave | Madison, NJ 07940 (973) 408-3207 | drew.edu <http://www.drew.edu/?utm_source=FIL_Email_Footer&utm_medium=email&utm_campaign=FIL%2BEmail%2BFooter> <http://www.drew.edu/undergraduate/?utm_source=FIL_Email_Footer&utm_medium=email&utm_campaign=FIL%2BEmail%2BFooter> On Tue, Nov 28, 2017 at 3:13 PM, Kenny Ketner <kenny.ket...@gmail.com> wrote: > Brad et al, > > > We use wget scripts to back up our internet archive pages, which, oddly > enough, are the instructions given by internet archive itself. :/ > > > > Kenny Ketner > Information Products Lead > Montana State Library > 406-444-2870 > kket...@mt.gov > kennyketner.com > > On Tue, Nov 28, 2017 at 12:31 PM, Brett <brett.l.willi...@gmail.com> > wrote: > > > Yes, I did ask, and ask, and ask, and waited for 2 months. There was > > something political going on internally with that group that was well > > beyond my pay grade. > > > > I did explain the potential problems to my boss and she was providing > > cover. > > > > I did it in batches as Google Sheets limits the amount of ImportXML that > > you can do in a 24 hour span, so I wasn't hammering anyone's web server > > into oblivion. > > > > It's funny, I actually had to do a fair amount to get the old V1 > LibGuides > > link checker to stop hammering my ILS into going offline back in > 2010-2011. > > > > > > > > On Tue, Nov 28, 2017 at 2:18 PM, Bill Dueber <b...@dueber.com> wrote: > > > > > Brett, did you ask the folks at the Large University Library if they > > could > > > set something up for you? I don't have a good sense of how other > > > institutions deal with things like this. > > > > > > In any case, I know I'd much rather talk about setting up an API or a > > > nightly dump or something rather than have my analytics (and > bandwidth!) > > > blown by a screen scraper. I might say "no," but at least it would be > an > > > informed "no" :-) > > > > > > On Tue, Nov 28, 2017 at 2:08 PM, Brett <brett.l.willi...@gmail.com> > > wrote: > > > > > > > I leveraged the IMPORTXML() and xpath features in Google Sheets to > pull > > > > information from a large university website to help create a set of > > > weeding > > > > lists for a branch campus. They needed extra details about what was > in > > > > off-site storage and what was held at the central campus library. > > > > > > > > This was very much like Jason's FIFO API, the central reporting group > > had > > > > sent me a spreadsheet with horrible data that I would have had to > sort > > > out > > > > almost completely manually, but the call numbers were pristine. I > used > > > the > > > > call numbers as a key to query the catalog with limits for each > campus > > I > > > > needed to check, and then it dumped all of the necessary content > > > (holdings, > > > > dates, etc) into the spreadsheet. > > > > > > > > I've also used Feed43 as a way to modify certain RSS feeds and scrape > > > > websites to only display the content I want. > > > > > > > > Brett Williams > > > > > > > > > > > > On Tue, Nov 28, 2017 at 1:24 PM, Brad Coffield < > > > > bcoffield.libr...@gmail.com> > > > > wrote: > > > > > > > > > I think there's likely a lot of possibilities out there and was > > hoping > > > to > > > > > hear examples of web scraping for libraries. Your example might > just > > > > > inspire me or another reader to do something similar. At the very > > > least, > > > > > the ideas will be interesting! > > > > > > > > > > Brad > > > > > > > > > > > > > > > -- > > > > > Brad Coffield, MLIS > > > > > Assistant Information and Web Services Librarian > > > > > Saint Francis University > > > > > 814-472-3315 > > > > > bcoffi...@francis.edu > > > > > > > > > > > > > > > > > > > > > -- > > > Bill Dueber > > > Library Systems Programmer > > > University of Michigan Library > > > > > >