[analog-help] spiders again

Marco Bernardini Sun, 6 Feb 2000 09:45:57 -0800
At 12.32 15/12/99 -0500, Boris Goldowsky wrote:
>I've been asked to report page-view statistics for our web site
>which eliminate page views from search-engine spiders and other robots.
>I've tried to do some of this by coming up with a list of User-Agent
>strings that look like spiders, but it seems like a hit-or-miss sort
>of approach.

At 18.37 15/12/99 +0000, Stephen Turner wrote:
>Of course, some things are obviously spiders. ...
> But I think there are also some spiders which
>have a plain Mozilla user-agent. One could only spot these by the speed of
>requests over a period of tens of minutes or hours and filter them out
>manually. I don't know of a good way to do this automatically.

At 17.40 15/12/99 -0500, Aengus Lawlor wrote:
>There's nothing to distinguish any HTTP request (from a person or a 
>robot), except the IP address it's coming from, and the UserAgent 
>string.

Supposing I find every robot (doing a grep for robots.txt, looking manually
for heavy movement or annoying engines webmasters) how can I group them
separately into the BROWSER report?
I try things like
BROWALIAS Scooter* "Robot Scooter$1"
BROWALIAS Slurp* "Robot Slurp$1"
and
BROWOUTPUTALIAS Scooter* "Robot Scooter$1"
BROWOUTPUTALIAS Slurp* "Robot Slurp$1"
but they are always separated.

Any clue?

Thanks in advance!

Marco Bernardini
------------------------------------------------------------------------
This is the analog-help mailing list. To unsubscribe from this
mailing list, send mail to [EMAIL PROTECTED]
with "unsubscribe" in the main BODY OF THE MESSAGE.
List archived at http://www.mail-archive.com/[email protected]/
------------------------------------------------------------------------
[analog-help] spiders again

Reply via email to