You need to put a robots.txt file in the server path with the following:
#
# keep all robots out of entire site
User-agent: *
Disallow: /
-----Original Message-----
From: Linux on 390 Port [mailto:[EMAIL PROTECTED] On Behalf Of
Levy, Alan
Sent: Wednesday, January 16, 2008 8:24 AM
To: [email protected]
Subject: web crawling problem
I have a server that gets about 1M hits per day. Over the past week,
this has exploded and the server is using about 80% of the cpu. We
figure that someone is using a webcrawler since when we analyze the
tomcat logs, there are thousands of hits from one ip address (every day
it's a different ip address).
Is there an open source or commercial product that will stop this?
Tia
----------------------------------------------------------------------
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or
visit
http://www.marist.edu/htbin/wlvindex?LINUX-390
----------------------------------------------------------------------
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390