[analog-help] Disregarding page requests from Googlebot and the like

Zimoski, Tom Thu, 16 Sep 2004 14:43:40 -0700

I'm responsible for assessing the use of our library website by
pondering the web server logs.  The log I get from our system
administrator doesn't include a translation of the numerical addresses
of what Analog calls Hosts, nor does it have anything about User Agent.


Here are a couple of lines from the log I get:

64.68.82.184 - - [01/Aug/2004:00:02:12 +0800] "GET /ref/govfedjust.html
HTTP/1.0" 200 8268 

64.68.82.30 - - [01/Aug/2004:00:02:12 +0800] "GET /ref/govfedmil.html
HTTP/1.0" 200 6903

Out of curiosity I have been translating the numerical addresses of the
Hosts from which we get a lot of requests, and I notice that a lot of
them are from googlebot and the like.  I'm thinking this doesn't really
count as the sort of "use" of our website I want to keep track of.  On
the other hand, disregarding requests from these sites takes some
effort.  

There's some helpful information at http://www.iplists.com/ and also at
http://www.searchengineworld.com/spiders/spider_ips.htm but I'm
discouraged about how complicated this could become.  

So I'm looking for advice about what to do.  I'm thinking about looking
at the hosts from which I get more than x requests in a month, figuring
out which of those are search engines, and throwing them out.  I might
use a percentage of requests rather than a specific number of requests
as a threshhold.

Thanks for your attention.

Tom Zimoski
Reference Dept/Fresno County Library

+------------------------------------------------------------------------
|  TO UNSUBSCRIBE from this list:
|    http://lists.meer.net/mailman/listinfo/analog-help
|
|  Usenet version: news://news.gmane.org/gmane.comp.web.analog.general
|  List archives:  http://www.analog.cx/docs/mailing.html#listarchives
+------------------------------------------------------------------------

[analog-help] Disregarding page requests from Googlebot and the like

Reply via email to