I've developed a throttle to slow down the fetches from crawlers. It is configured in the dspace-web.xml file as a filter with 3 parameters:
1. PERIOD 2. Number of HITS to allow for the PERIOD. So say you have PERIOD set to 10 seconds, and HITS to 20, you will allow 20 hits from a certain IP for that period. If that is exceeded, the system will deny access to that IP address. 3. Time to block the ip address that exceeds the hit limit from regaining access. I got some of the code for this from the Tapir project. If this is something that you are interested in I can send you the code I have. -Jose -----Original Message----- From: Cory Snavely [mailto:[EMAIL PROTECTED] Sent: Wednesday, March 21, 2007 9:22 AM To: Jose Blanco Subject: [Fwd: [Dspace-tech] Keeping spiders out of the statistics] You should consider posting about what you developed. -------- Forwarded Message -------- From: Mark H. Wood <[EMAIL PROTECTED]> To: [email protected] Subject: [Dspace-tech] Keeping spiders out of the statistics Date: Tue, 20 Mar 2007 15:13:08 -0400 Has anyone found a fairly good automatic method of maintaining a list of spider addresses, for ignoring hits from web indexing activities when counting document fetches? ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ DSpace-tech mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dspace-tech ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ DSpace-tech mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dspace-tech

