Dorothea Salo wrote: > 2008/8/26 Mark H. Wood <[EMAIL PROTECTED]>: >> On Tue, Aug 26, 2008 at 10:07:43AM -0500, Tim Donohue wrote: >>> So, although I think it was already mentioned, I'd add as a requirement >>> for a good Statistics Package: >>> >>> * Must filter out web-crawlers in a semi-automated fashion! >> +1! Suggestions as to how? > > The site <http://www.user-agents.org/> maintains a list of > user-agents, classified by type. They have an XML-downloadable version > at <http://www.user-agents.org/allagents.xml>, as well as an RSS-feed > updater. Perhaps polling this would be a useful starting point? > > Dorothea >
Universidade of Minho's Statistics Add-On for DSpace can do some basic automated filtering of web crawlers: See its list of main features on the DSpace Wiki: http://wiki.dspace.org/index.php//StatisticsAddOn (It looks like they determine spiders by how spiders tend to identify themselves. Most "nice" spiders, like Google, will identify themselves in a common fashion, e.g. "Googlebot") Frankly, although our statistics for IDEALS are nice looking...Minho's work is much more extensive and offers a greater variety of features (from what I've seen/heard of it). It's just missing our "Top 10 Downloads" list :) - Tim -- Tim Donohue Research Programmer, Illinois Digital Environment for Access to Learning and Scholarship (IDEALS) University of Illinois at Urbana-Champaign [EMAIL PROTECTED] | (217) 333-4648 _______________________________________________ Dspace-general mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/dspace-general
