Re: web crawling problem

Robert Flynn Wed, 16 Jan 2008 05:58:10 -0800

You need to put a robots.txt file in the server path with the following:


#                                          

# keep all robots out of entire site       

                                           

User-agent: *                              

Disallow: /                                

 

-----Original Message-----
From: Linux on 390 Port [mailto:[EMAIL PROTECTED] On Behalf Of
Levy, Alan
Sent: Wednesday, January 16, 2008 8:24 AM
To: [email protected]
Subject: web crawling problem

 

  

 

I have a server that gets about 1M hits per day. Over the past week,

this has exploded and the server is using about 80% of the cpu. We

figure that someone is using a webcrawler since when we analyze the

tomcat logs, there are thousands of hits from one ip address (every day

it's a different ip address).

 

 

 

Is there an open source or commercial product that will stop this?

 

 

 

Tia

 

 

 

 

 

 

 

 

 

 

----------------------------------------------------------------------

For LINUX-390 subscribe / signoff / archive access instructions,

send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or
visit

http://www.marist.edu/htbin/wlvindex?LINUX-390


----------------------------------------------------------------------
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390

Re: web crawling problem

Reply via email to