Here are a few other ideas on the spider issue.

You can try the Web Robots Database at:
http://info.webcrawler.com/mak/projects/robots/active.html
However, the page itself states: "Note that now robot technology is being
used in increasing numbers of end-user products, this list is becoming less
useful and complete."

Oh, well!  You might also try BotWatch, a perl utility at:
http://www.tardis.ed.ac.uk/~sxw/robots/
"Whilst the majority of web servers keep comprehensive transfer logs, 
there can be difficulties in identifying robots activity from these 
logs. It is possible to identify robots manually from log files, 
although it is a time consuming process. Far better is an automated 
approach - such as Botwatch, a perl utility."

When you think you have a handle on all the robots out there, it's 
time to visit BotSpot at http://www.botspot.com/ so you can see that 
any attempt at a comprehensive accounting of robot activity is 
basically doomed.

Personally, I would go with Aengus Lawlor's approach: "But all well 
behaved spiders request /ROBOTS.TXT before anything else on your 
server, so if you do a report on just that file, you should get a 
quick run down of the UserAgents or IP addresses to exclude."

Regards,

Ron Bodtcher


At 11:33 AM 12/15/99 -0700, you wrote:
>Boris Goldowsky wrote:
>
>> I've been asked to report page-view statistics for our web site
>> which eliminate page views from search-engine spiders and other robots.
>>
>> I've tried to do some of this by coming up with a list of User-Agent
>> strings that look like spiders, but it seems like a hit-or-miss sort
>> of approach.
>>
>> Does anyone know if there are standard lists of spider vs. interactive
>> User-Agents, or is there a better method of filtering out spider page
>> views?  Any config-file snippets people would be willing to share?
>>
>
>There is no standard list, that I am aware of, and the list is constantly
>changing. You might check http://www.searchenginewatch.com for a partial list
>of spider User Agents. Unfortunately, some "spiders" will represent
>themselves as "interactive user agents", like Netscape Navigator or IE so
>that they can be sure to receive the content intended for such interactiv
>agents.
>
>I think there is also a partial list of spider user agents on the helper
>applications page for Analog. Otherwise, you pretty much just have to try and
>guess and make the best estimates you can.
>
>HTH,
>
>
>--
>Jeremy Wadsack
>Digital Media Consultant
>___________________________
>Wadsack-Allen Digital Group
>http://www.wadsack-allen.com/digitalgroup/
>
>
>------------------------------------------------------------------------
>This is the analog-help mailing list. To unsubscribe from this
>mailing list, send mail to [EMAIL PROTECTED]
>with "unsubscribe analog-help" in the main BODY OF THE MESSAGE.
>List archived at http://www.mail-archive.com/[email protected]/
>------------------------------------------------------------------------
>

------------------------------------------------------------------------
This is the analog-help mailing list. To unsubscribe from this
mailing list, send mail to [EMAIL PROTECTED]
with "unsubscribe analog-help" in the main BODY OF THE MESSAGE.
List archived at http://www.mail-archive.com/[email protected]/
------------------------------------------------------------------------

Reply via email to