Many administrators record only IP numbers in logs to save time. They know that the numbers can be converted to addresses later. Hopefully, you're using a utility to convert the addresses in batch and not converting them individually.
By default, some servers don't collect User Agent for the log file. Your administrator could collect User Agent information (though he or she might complain about additional disk space consumed). If you use a log file in which User Agent information wasn't being collected at the start of the file, but was added part way through, you might encounter a warning that some lines can't be read. If this happens, there is a way to analyze the file.
Analog ordinarily reports all usage information found in your log. This is helpful to many organizations, but exclusions are possible.
Some specific pages that might be useful: * http://www.analog.cx/docs/include.html * http://www.analog.cx/docs/meaning.html * http://www.analog.cx/helpers/
Hope that helps,
-- Duke
Zimoski, Tom wrote:
I'm responsible for assessing the use of our library website by pondering the web server logs. The log I get from our system administrator doesn't include a translation of the numerical addresses of what Analog calls Hosts, nor does it have anything about User Agent.
Here are a couple of lines from the log I get:
64.68.82.184 - - [01/Aug/2004:00:02:12 +0800] "GET /ref/govfedjust.html
HTTP/1.0" 200 8268
64.68.82.30 - - [01/Aug/2004:00:02:12 +0800] "GET /ref/govfedmil.html HTTP/1.0" 200 6903
Out of curiosity I have been translating the numerical addresses of the
Hosts from which we get a lot of requests, and I notice that a lot of
them are from googlebot and the like. I'm thinking this doesn't really
count as the sort of "use" of our website I want to keep track of. On
the other hand, disregarding requests from these sites takes some
effort.
There's some helpful information at http://www.iplists.com/ and also at
http://www.searchengineworld.com/spiders/spider_ips.htm but I'm
discouraged about how complicated this could become.
So I'm looking for advice about what to do. I'm thinking about looking at the hosts from which I get more than x requests in a month, figuring out which of those are search engines, and throwing them out. I might use a percentage of requests rather than a specific number of requests as a threshhold.
Thanks for your attention.
Tom Zimoski
Reference Dept/Fresno County Library
begin:vcard fn:Duke Hillard n:Hillard;Duke org:University of Louisiana at Lafayette;University Computing Support Services adr:;;P.O. Box 42770;Lafayette;LA;70504-2770;USA email;internet:[EMAIL PROTECTED] title:University Webmaster tel;work:337.482.5763 url:http://www.louisiana.edu/ version:2.1 end:vcard
+------------------------------------------------------------------------ | TO UNSUBSCRIBE from this list: | http://lists.meer.net/mailman/listinfo/analog-help | | Usenet version: news://news.gmane.org/gmane.comp.web.analog.general | List archives: http://www.analog.cx/docs/mailing.html#listarchives +------------------------------------------------------------------------

