Note that ROBOTINCLUDE and ROBOTEXCLUDE determine which browsers count as "robots" in the Operating System Report only. They don't work like HOST[EX|IN]CLUDE or FILE[EX|IN]CLUDE that [include|exclude] a host or file from all reports. Also be aware that ROBOT[EX|IN]CLUDE rely on User Agent string. Currently, User Agent strings aren't being logged.
-- Duke
Leonard Daly wrote:
At 02:28 PM 9/16/04, Zimoski, Tom wrote:
I'm responsible for assessing the use of our library website by pondering the web server logs. The log I get from our system administrator doesn't include a translation of the numerical addresses of what Analog calls Hosts, nor does it have anything about User Agent.
<snip...>
Out of curiosity I have been translating the numerical addresses of the
Hosts from which we get a lot of requests, and I notice that a lot of
them are from googlebot and the like. I'm thinking this doesn't really
count as the sort of "use" of our website I want to keep track of. On
the other hand, disregarding requests from these sites takes some
effort.
<snip...>
So I'm looking for advice about what to do. I'm thinking about looking at the hosts from which I get more than x requests in a month, figuring out which of those are search engines, and throwing them out. I might use a percentage of requests rather than a specific number of requests as a threshhold.
Thanks for your attention.
Tom Zimoski Reference Dept/Fresno County Library
Tom,
What you are looking to do is called DNS Lookup. analog has documentation at http://analog.cx/docs/dns.html. The first time you have analog do DNS Lookup, it will take a long time as analog goes and translates each IP address to its host name. You can configure analog so that subsequent runs of analog go much faster by using the DNS cache (see DNSFILE).
Robots (search engines and the like) are handled through the ROBOT* commands. See http://analog.cx/docs/include.html#ROBOTINCLUDE for a start. The pre-configured .cfg file contains a good starting point.
+-------- | Leonard Daly <[EMAIL PROTECTED]> | Internet Development _http://realism.com/_ | e3D News Technical Editor _http://e3dNews.com/_ <http://e3dnews.com/> | SIGGRAPH 2002&2003 X3D Course Organizer | Member, Web3D Board of Directors +------------------------------
begin:vcard fn:Duke Hillard n:Hillard;Duke org:University of Louisiana at Lafayette;University Computing Support Services adr:;;P.O. Box 42770;Lafayette;LA;70504-2770;USA email;internet:[EMAIL PROTECTED] title:University Webmaster tel;work:337.482.5763 url:http://www.louisiana.edu/ version:2.1 end:vcard
+------------------------------------------------------------------------ | TO UNSUBSCRIBE from this list: | http://lists.meer.net/mailman/listinfo/analog-help | | Usenet version: news://news.gmane.org/gmane.comp.web.analog.general | List archives: http://www.analog.cx/docs/mailing.html#listarchives +------------------------------------------------------------------------

