On Wed, Nov 24, 1999 at 05:26:08AM +0100, Gerald Richter wrote:
> > I have for the first time encountered the problem that some braindead
> > web robot (ExtractorPro) attempted to download all of the site and
> > appended some random URL segment at the end of an embedded perl page. I
> > use suffix .phtml for these pages, and the url looked like
> > <http://mysite//page.phtml/randomotherurl>. The innocent embperl page
> > delivered some contents with relative urls and the robot continued to
> > fetch the same page with various URL suffixes, causing a loop and doing
> > the equivalent of an Apache bench remotely.
> >
> > What is the best way to stop these kinds of mishaps? And what the heck
> > is this ExtractorPro thing?
> >
> 
> Maybe Apache::SpeedLimit is helpfull. It limits the number of pages one
> client can fetch in per time. There a other Apache modules to block robots,
> look at the Apache module list.

That would be necessary if this guy would suck up all my CPU
power/bandwidth, but that did not work out that way. I think I will have
to put some code into my embperl pages to produce an error if there is a
path info for a script that is not supposed to have one.

-- 
Jens-Uwe Mager

HELIOS Software GmbH
Steinriede 3
30827 Garbsen
Germany

Phone:          +49 5131 709320
FAX:            +49 5131 709325
Internet:       [EMAIL PROTECTED]

Reply via email to