I've developed a throttle to slow down the fetches from crawlers.  It is
configured in the dspace-web.xml file as a filter with 3 parameters:

1. PERIOD
2. Number of HITS to allow for the PERIOD.  So say you have PERIOD set to 10
seconds, and HITS to 20, you will allow 20 hits from a certain IP for that
period.  If that is exceeded, the system will deny access to that IP
address.
3. Time to block the ip address that exceeds the hit limit from regaining
access.

I got some of the code for this from the Tapir project.

If this is something that you are interested in I can send you the code I
have.

-Jose
-----Original Message-----
From: Cory Snavely [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, March 21, 2007 9:22 AM
To: Jose Blanco
Subject: [Fwd: [Dspace-tech] Keeping spiders out of the statistics]

You should consider posting about what you developed. 

-------- Forwarded Message --------
From: Mark H. Wood <[EMAIL PROTECTED]>
To: [email protected]
Subject: [Dspace-tech] Keeping spiders out of the statistics
Date: Tue, 20 Mar 2007 15:13:08 -0400

Has anyone found a fairly good automatic method of maintaining a list
of spider addresses, for ignoring hits from web indexing activities
when counting document fetches?

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________ DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech





-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech

Reply via email to