I'm responsible for assessing the use of our library website by pondering the web server logs. The log I get from our system administrator doesn't include a translation of the numerical addresses of what Analog calls Hosts, nor does it have anything about User Agent.
Here are a couple of lines from the log I get: 64.68.82.184 - - [01/Aug/2004:00:02:12 +0800] "GET /ref/govfedjust.html HTTP/1.0" 200 8268 64.68.82.30 - - [01/Aug/2004:00:02:12 +0800] "GET /ref/govfedmil.html HTTP/1.0" 200 6903 Out of curiosity I have been translating the numerical addresses of the Hosts from which we get a lot of requests, and I notice that a lot of them are from googlebot and the like. I'm thinking this doesn't really count as the sort of "use" of our website I want to keep track of. On the other hand, disregarding requests from these sites takes some effort. There's some helpful information at http://www.iplists.com/ and also at http://www.searchengineworld.com/spiders/spider_ips.htm but I'm discouraged about how complicated this could become. So I'm looking for advice about what to do. I'm thinking about looking at the hosts from which I get more than x requests in a month, figuring out which of those are search engines, and throwing them out. I might use a percentage of requests rather than a specific number of requests as a threshhold. Thanks for your attention. Tom Zimoski Reference Dept/Fresno County Library +------------------------------------------------------------------------ | TO UNSUBSCRIBE from this list: | http://lists.meer.net/mailman/listinfo/analog-help | | Usenet version: news://news.gmane.org/gmane.comp.web.analog.general | List archives: http://www.analog.cx/docs/mailing.html#listarchives +------------------------------------------------------------------------

