On 12/15/99 at 12:32PM Boris Goldowsky wrote:
>I've been asked to report page-view statistics for our web site
>which eliminate page views from search-engine spiders and other robots.
>
>I've tried to do some of this by coming up with a list of User-Agent
>strings that look like spiders, but it seems like a hit-or-miss sort
>of approach.
There's nothing to distinguish any HTTP request (from a person or a
robot), except the IP address it's coming from, and the UserAgent
string. And, almost by definition, you can't have a definitive list of
these. (For example, most search engine software will allow the agent
string to be customized).
But all well behaved spiders request /ROBOTS.TXT before anything else
on your server, so if you do a report on just that file, you should get
a quick run down of the UserAgents or IP addresses to exclude.
(Obviously, a person can request that file manually, but it doesn't
happen very frequently).
>Bng
Aengus
Unknown data type