> > I thought someone out there might be interested in a poster session I just > did at the Innovative Users Group Conference 2009. I undertook the project > because i was personally interested in the outcome, and because I look > forward to the day when these data will be available - from Google, from the > Internet Archive, from Hathi trust, from ????. > > It's fraught with problems and both recall and precision errors, but I call > it an "approximation" of citation searching for the books in the Colgate > collection, then ranking them by the number of hits. > > I took about 688,000 monographic records that had both an author and a > title from the Colgate library catalog, and constructed a search in > GoogleBookSearch. Since I wanted to find citations - or other books that > mentioned the book in question, I didn't restrict by field. > > Title phrase from 245 subfields a & b, up to 10 words long. > plus: > first two words in the author (if a personal author) > author phrase (if a conference author) > first 6 words in author (if a corporate author) > > Searched these over the course of 3/1/2009 - 4/27/2009 at less than 380 > searches an hour (took 3 machines to get the job done in 6 weeks). > Screen-scraped Google's reported "1 to 8 of <#hits> records". > > The results rank these by the # of "citations". > > > http://lisv06.colgate.edu/GBSCites/default.aspx > > My results omit GovDocs for the time being, since I forgot to download the > 086s into the records - I could add that later. Those corporate bodies are > problems in my search strategy, anyway. I did include them in the search > portion of the project. > > I don't know how many users this MySql site will support - it's entirely > un-stress-tested, but i trust you won't all go searching it at once. > > -- > Cindy Harper, Systems Librarian > Colgate University Libraries > char...@colgate.edu > 315-228-7363 > >