On Sun, Feb 13, 2011 at 02:45, Jeffrey Trimble <[email protected]> wrote: > We've had to block several sites (certain web crawlers causing us headaches, > and not the legitimate ones) using IPSec. Of course > it blocks them from everything. > That's one option, though a little severe, IMHO.
Right, blocking a web spider is an application where you really could think of blocking a single IP from DSpace. The problem with blocking a single IP is that the attacker's IP may change in time. If you want to block a well-behaving spider, it should respect robots.txt. You can find the spider's name from your apache access logs. Then you can block just this one robot. In DSpace, you should place robots.txt in these locations: [dspace]/webapps/jspui/robots.txt [dspace]/webapps/xmlui/static/robots.txt The contents would look like this (with the name of the spider from your logs): User-agent: BadBot Disallow: / Details here: http://www.robotstxt.org/ http://www.robotstxt.org/faq/blockjustbad.html Regards, ~~helix84 ------------------------------------------------------------------------------ The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE: Pinpoint memory and threading errors before they happen. Find and fix more than 250 security defects in the development cycle. Locate bottlenecks in serial and parallel code that limit performance. http://p.sf.net/sfu/intel-dev2devfeb _______________________________________________ DSpace-tech mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dspace-tech

